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Raryr,ROirND OF TFF INVENTION 

This invention relates generally to methods fcr 
synthesizing and expressing oligonucleotides and, nore 
particularly, to methods for expressing oligonucleotides 
having random codon sequences. 

Oligonucleotide synthesis proceeds via linear coupling 
of individual monomers in a stepwise reaction. The 
reactions are generally performed on a solid phase support 
by first coupling the 3 • end of the first monomer to the 
support. The second monomer is added to the 5' end of the 
first monomer in a condensation reaction to yield a 
dinucleotide coupled to the solid support. At the end of 
each coupling reaction, the by-products and unreacted, free 
monomers are washed away so that the starting material for 
the next round of synthesis is the pure oligonucleotide 
attached to the support. In this reaction scheme, the 
stepwise addition of individual monomers to a single, 
growing end of a oligonucleotide ensures accurate synthesis 
of the desired sequence. Moreover, unwanted side reactions 
are eliminated, such as the condensation of t^'o 
oligonucleotides, resulting in high product yields. 

in some instances, it is desired that synthetic 
oligonucleotides have random nucleotide sequences. This 
result can be accomplished by adding equal proportions of 
all four nucleotides in the monomer coupling reactions, 
leading to the random incorporation of all nucleotides and 
yielding a population of oligonucleotides with random 
sequences. Since all possible combinations of nucleotide 
sequences are represented within the population, all 
possible codon triplets will also be represented. If the 



wo 92/06176 



— .r-^-elv to generate randor. pep.iae 
objective .-^.n^- ^ely .....^.^on because the 

products, this approach has a severe lxr:..a.-on 
random codons synthesized will bias ...e ^ 
" incorporated during translation of the DNA r^y ...-^ ^--^ 
5 polypeptides. 

^^^^.^ r,- ^h-^ aenetic code. 
The bias is due to the reaundanc, o. ge 

^ 1 pads to sixxy- 

There are four nucleotide monor.ers unic. .eaas 

LT^^--n twenty anmo acids 

four possible triplet codons. ^.itn onl, > 

to specify, .any of the a.ino acids are encoded by .u 

,0 codons. Therefore, a population of oligonucleotide 

4.-,i aririi-t-inn of monomers rro_. a 
synthesized by sequential addition o. 

.an.o. population will not encode peptides whose a»xno ao.d 
sequence represents all possible oOBbinatlons of t^e twenty 
different amino acids in equal proportions. That is, - 
15 frequency of aBino acids incorporated into polypeptides 
uinbe biased toward those aaino acids which are specified 
by multiple codons. 

TO alleviate amino acid bias due to the redundancy of 
the genetic code, the oligonucleotides can -^y^^lZll 
20 from nucleotide triplets. Here, a triplet coding foreach 
Of the twenty amino acids is synthesized from individual 
monomers. Once synthesized, the triplets are used in the 
coupling reactions instead of individual 
mixing equal proportions of the triplets, -y-^^-^ 
.3 oligonucleotides with random codons can be accomplxshed 
However, the cost of synthesis from such -^P^^^jj 
exceeds that of synthesis from individual monomers because 
triplets are not commercially available. 

v.^ TPduced, however, by 
acid bias can be reauceu, 



Amino a^^y^ ^^^^ VT n c 

. _ in. TV t.tV* ri is 

synthesizing the degenerate codon sequent -— . 

a Mi^cture of all four nucleotides and K is a Bmure 
guanine and thymine nucleotides. Bach position with n an 
riigonucleotide having this codon sequence will contain a 
„Jl 3. codons (12 encoding anino acids being 
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represented once, 5 represented twice, 3 represented three 
tines and one codon being a stop codon) . Oligonucleotides 
expressed with such degenerate codon sequences will produce 
peptide products whose sequences are biased reward those 
5 amino acids being represented more than once. Thus, 
populations of peptides whose sequences are completely 
random cannot be obtained fror. oligonucleotides synthesized 
from degenerate sequences. 

There thus exists a need for a method to express 
10 oligonucleotides having a fully random or desirably biased 
sequence which alleviates genetic redundancy. The present 
invention satisfies these needs and provides additional 
advantages as well . 

qrTMWARY OF THE J^^NTION 

The invention provides a plurality of procaryotic 
cells containing a diverse population of expressible 
oligonucleotides operationally linked to expression 
elements, the expressible oligonucleotides having a 
desirable bias of random codon sequences. 

20 pRTEF DESCPTPTION DRAWINGS 

Figure 1 is a schematic drawing for synthesizing 
oligonucleotides from nucleotide monomers with random 
tuplets at each position using twenty reaction vessels. 

Figure 2 is a schematic drawing for synthesizing 
25 oligonucleotides from nucleotide monomers with random 
tuplets at each position using ten reaction vessels. 

Figure 3 is a schematic diagram of the two vectors 
used for sublibrary and library production from precursor 
oligonucleotide portions. M13IX22 (Figure 3A) is the 
3 0 vec£or used to clone the anti-sense precursor portions 
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, ^he single-headed arrow represents the U= 
(hatched box) . -he 9 do'oble-headed arrow 

p/o expression J^,,, „,i,h is to he combined 

represents the por^.on o^MUXX^^^ ^ ^^^^^^^^^^ 

,.th restriction sites are also shown. 

, selection and relev ^^^^ ^^^^^ 

M13IX42 (Figure 3B) IS th represent the 

precursor ^''^'^-'^ZZ.^^ -VPe <.VIXI, .^-e VXIX 

pseudo-wild type gVlll) portion 

„ces. - -:r:er--:\i^ M131X.. Thet.. 
,0 of H13IX42 Which - ^° ^ restriction sites are also 

ar*er stop codons and releva ^^^^^^^ population 

shown. Figure 3C shows the ornin ^^^^^^^ ^^^^^^^^^ 
.ro. su^lihraries to fo^ the .un- ^^^^^^^^ ^ ^^^^^^^ 

.ector ^::--,rr:::-supp:essor stram and the 

15 expression library ^^^^ i„£e^, , 

production of phage. ^'-^ ^ ^^^^^^e expression and 
suppressor strain (Figure 3E) 
screening of the library. 

^he vector used for 
ngure 4 is a schematic ^^^^^^ ^'J^^^^^ .^o^ randon 
*: =,iT-face expression libraries 
20 generation of surta 31x30) . The symbols are as 

oligonucleotide populations (M13IX30) 

described for Figure 3. 

.,^re 3 is the nucleotide seguenoe of «13XX4. (SK. XO 



NO: 1) • 



25 



NO: 2) 



NO: 3). 



30 NO: 4) 



ngure e is the nucleotide se^ence of ni3XX.. (SB« X. 
igure V U the nucleotide se^ence of «13XX3C (SBO XO 
Lre e is the nucleotide sequence of K13B003 (SKO XO 
'.',g.re S is the nucleotide sequence of K13XX«1 (SB. 
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ID NO: 5) . 



Figu 
ID NO: 6) 



igure 10 IS the nucleotide seg-aence of M13ED04 (SEQ 



ppr^p^jT^n nP-.^rp TyrTON OF TH E ThJVgNTION 

5 This invention is directed to a sinple and inexpensive 

method for synthesizing and expressing oligonucleotides 
having a desirable bias of random codons using individual 
monomers. The method is advantageous in that individual 
monomers are used instead of triplets and by synthesizing 

10 only a non-degenerate subset of all triplets, codon 
redundancy is alleviated. Thus, the oligonucleotides 
synthesized represent a large proportion of possible randon 
triplet sequences which can be obtained. The 
oligonucleotides can be expressed, for example, on the 

15 surface of filamentous bacteriophage in a form which does 
not alter phage viability or impose biological selections 
against certain peptide sequences. The oligonucleotides 
produced are therefore useful for generating an unlimited 
number of pharmacological and research products. 

2 0 in one embodiment, the invention entails the 

sequential coupling of monomers to produce oligonucleotides 
with a desirable bias of random codons. The coupling 
reactions for the randomization of twenty codons which 
specify the amino acids of the genetic code are performed 
25 in ten different reaction vessels. Each reaction vessel 
contains a support on which the monomers for two different 
codons are coupled in three sequential reactions. One of 
the reactions couples an equal mixture of two monomers such 
that the final product has two different codon sequences. 

3 0 The codons are randomized by removing the supports from the 

reaction vessels and nixing them to produce a single batch 
of supports containing all twenty codons at a particular 
position, synthesis at the next codon position proceeds by 
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ecjually divxding the .ixed batch of supports xnto t^n 
Zo^ion vessels as before a.d sequentxally couplxng the 
.ono^ers for each pair of codons. The supports are aga.n 
■ :,ixed to randomize the codons at the position ^ust 
5 :Uesi.ed. The cycle of coupling, nixing and -v.d.ng 
cl!!tinues until the desired nu^er of codon posxtxons have 
been randomized. After the last position has been 
rando^nized, the oligonucleotides with random codons are 
cleaved from the support. The random oligonucleotides can 
xo then be expressed, for example, on the surface o 
filamentous bacteriophage as gene Vlll-peptide fusion 
proteins. Alternative genes can be used as well. 

in its broadest form, the invention provides a diverse 
population of synthetic oligonucleotides contained xn 
15 vectors so as to be expressible in cells. Such populations 
of diverse oligonucleotides can be fully random at one or 
.ore codon- sites or can be fully defined at one or more 
site, so long as at least one site the codons are randomly 
variable. The populations of oligonucleotides can be 
.0 expressed as fusion products in combination wit^ surface 
proteins of filamentous bacteriophage, such as M13, vx 
gene VIII. The vectors can be transf acted into a pluralxty 
of cells, such as the procaryote E,_c2li- 

The diverse population of oligonucleotides can be 
formed by randomly combining first and second precursor 
populations, each precursor population having a desxrable 
bias Of random codon sequences. Methods of synthesizing 
and expressing the diverse population of expressible 
oligonucleotides are also provided. 

in a preferred eiDbodiment, r»o popuxo^^— — 

cUgonucle^tides .re synthesized. The oligonucleotxdes 
within each population encode a portion of the f.nal 
oligonucleotide which is to be expressed. Olxgonucleot.des 
within one population encode the carboxy terB.nal portion 
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of the expressed oligonucleotides. These oligonucleotides 
are cloned in frame with a gene VIII (gVIII) sequence so 
that translation of the sequence produces peptide fusion 
proteins. The second population of oligonucleotides are 
5 cloned into a separate vector. Fach oligonucleotide withir. 
this population encodes the anti-sense of the anmo 
terminal portion of the expressed oligonucleotides. Tnis 
vector also contains the elements necessary for expression. 
The two vectors containing the random oligonucleotides are 

10 combined such that the two precursor oligonucleotide 
portions are joined together at random to form a population 
of larger oligonucleotides derived from two smaller 
portions. The vectors contain selectable markers to ensure 
niaximum efficiency in joining together the two 

15 oligonucleotide populations. A mechanism also exists zo 
control the expression of gVIII-peptide fusion proteins 
during library construction and screening. 

AS used herein, the term -monomer" or "nucleotide 
Bonomer" refers to individual nucleotides used in the 
20 chemical synthesis of oligonucleotides. Monomers that can 
be used include both the ribo- and deoxyribo- forms of each 
of the five standard nucleotides (derived from the bases 
adenine (A or dA, respectively), guanine (G or dG) , 
cytosine (C or dC) , thymine (T) and uracil (U) ) . 
25 Derivatives and precursors of bases such as inosine which 
are capable of supporting polypeptide biosynthesis arc also 
included as monomers. Also included are chemically 
modified nucleotides, for example, one having a reversible 
blocking agent attached to any of the positions on the 
30 purine or pyrimidine bases, the ribose or deoxyribose sugar 
or the phosphate or hydroxyl moieties of the monomer. Such 
blocking groups include, for example, dimethoxytrityl , 
benzoyl, isobutyryl, beta-cyanoethyl and diisopropylamme 
groups, and are used to protect hydroxyls, exocyclic amines 
3 5 and phosphate moieties. Other blocking agents can also be 
used and are kno^-n to one skilled in the art. 



WO 92/06176 

8 

^3 .sed herein, the ter. "tuplet" refers « a group 
, n«\f a definable size. The elements ct a tuplet as 
: re are nucleotide .oncers, .or e.a.ple, a tuplet 

■ a dinucleotlde. a trinuclectid. or can also oe .our 

5 or more nucleotides. 

.s used herein, the ter. "codon.. or ""iP^^^" "^^^^ 
to a tuplat consisting of three adjacent nucleot de 
to a T-upxc twenty naturally 

— :ro X^lunrinlol^ltlde /ios.the=is 

occurrxiiy codons which do 

10 The term also includes nonsense, or stop, 

not specify any amino acid. 

..Random codons" or -randomized codons,- as used 
.erein refers to more than one codon at a position w.^.^^ 
a collection of oligonucleotides. The number of different 
Tns can be from two to twenty at any particular 
15 codons can oe j.^ . .^^^ i, as used herein, 

r,nsition " "Randomized oligonucleotides, as usea 
position. T f^cc with random 

refers to a collection of oligonucleotides witn 

:rns at one or .ore positions. ^^^-^Z::'^'^ 
used herein means that more than one coaon y 
.0 :it:rn a randomized oligonucleotide contains ra.aom cod.^^- 
.v.Mnle if randomized oligonucleotides are six 
For example if r ^.^^^ 

.a.e. up the ^ :r po^ir 

cUgonucleotides .he nu^^er^J^* P^^ 

coBbinatxons - ^° " „„,i3otides in length are 

30 oligonucleotides of fifteen n all 

^vntheslzed vhich have random ^v^"" - 

synthesiz all 

positions encoding all ^^^J ^.^^ ,,^^3 „U1 be 

3. rpriatlon ctitiLtLg the randomized oligonucleotides 
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will contain 2o'' different possible species of 
oligonucleotides. "Randori tuplets/' or "randonized 
• tuplets" are defined analogously. 

As used herein, the term "bias" refers to a 
preference. It is understood that there can be degrees of 
preference or bias toward codon sequences which encode 
particular amino acids. For example, an oligonucleotide 
whose codon sequences do not preferably encode particular 
amino acids is unbiased and therefore completely random. 
The oligonucleotide codon sequences can also be biased 
toward predetermined codon sequences or codon frequencies 
and while still diverse and random, will exhibit codon 
sequences biased toward a defined, or preferred, sequence. 
"A desirable bias of random codon sequences" as used 
herein, refers to the predetermined degree of bias which 
can be selected from totally random to essentially, but not 
totally, defined (or preferred). There must be at least 
one codon position which is variable, however. 

AS used herein, the term "support" refers to a solid 
phase material for attaching monomers for chemical 
synthesis. Such support is usually composed of materials 
such as beads of control pore glass but can be other 
materials known to one skilled in the art. The term is 
also meant to include one or more monomers coupled to the 
25 support for additional oligonucleotide synthesis reactions. 

AS used herein, the terms "coupling" or "condensing" 
refers to the chemical reactions for attaching one monomer 
to a second monomer or to a solid support. Such reactions 
are known to one skilled in the art and are typically 
performed on an automated DNA synthesizer such as a 
MilliGen/Biosearch Cyclone Plus Synthesizer using 
procedures recommended by the manufacturer. "Sequentially 
coupling" as used herein, refers to the stepwise addition 
of monomers. 
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A nethod of synthesizing oligonucleotides having 
random tuplets using individual monomers is described. The 
^nethod consists of several steps, the first being synthesis 
of a nucleotide tuplet for each tuplet to be randomized. 
AS described here and below, a nucleotide triplet (i.e., a 
codon) will be used as a specific example of a tuplet. Any 
size tuplet will work using the methods disclosed herein, 
and one skilled in the art would know how to use the 
methods to randomize tuplets of any size. 

If the randomization of codons specifying all twenty 
amino acids is desired at a position, then twenty different 
codons are synthesized. Likewise, if randomization of only 
ten codons at a particular position is desired then those 
ten codons are synthesized. Randomization of codons from 
two to siicty-four can be accomplished by synthesizing each 
desired triplet. Preferably, randomization of from two to 
twenty codons is used for any one position because of the 
redundancy of the genetic code. The codons selected at one 
position do not have to be the same codons selected at the 
next position. Additionally, the sense or anti-sense 
sequence oligonucleotide can be synthesized. The process 
therefore provides for randomization of any desired codon 
position with any number of codons. 

codons to be randomized are synthesized sequentially 
by coupling the first monomer of each codon to separate 
supports. The supports for the synthesis of each codon 
can, for example, be contained in different reaction 
vessels such that one reaction vessel corresponds to the 
monomer coupling reactions for one codon. As will be used 
here and below, if twenty codons are to be randomized, then 
twenty reaction vessels can be used in independent coupling 
reactions for the first twenty monomers of each codon. 
synthesis proceeds by sequentially coupling the second 
monomer of each codon to the first monomer to produce a 
, dimer, followed by coupling the third monomer for each 
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codon to each of the above-synthesxzed di=ers to produce a 
triiaer (Figure 1, step 1, where H„ H, and K3 represent the 
first, second and third Bononer, respectively, for each 
codoii to be randomized) . 

Following synthesis of the first codons fror. 
individual monomers, the randomization is achieved by 
nixing the supports fro. all twenty reaction vessels whxch 
contain the individual codons to be randomized. The solxd 
phase support can be removed from its vessel and mixed to 
achieve a random distribution of all codon species wxthzn 
the population (Figure 1, step 2) . The nixed population of 
supports, constituting all codon species, are then 
redistributed into twenty independent reaction vessels 
(Figure l, step 3). The resultant vessels are all 
identical and contain equal portions of all twenty codons 
coupled to. a solid phase support. 

For randomization of the second position codon. 
synthesis of twenty additional codons is performed in each 
of the twenty reaction vessels produced in step 3 as the 

20 condensing substrates of step 1 (Figure 1, step 4). Steps 
1 and 4 are therefore equivalent except that step 4 uses 
the supports produced by the previous synthesis cycle 
(Steps 1 through 3) for codon synthesis whereas step 1 xs 
the initial synthesis of the first codon in the 

25 oligonucleotide. The supports resulting from step 4 will 
each have two codons attached to them (i.e., a 
hexanucleotide) with the codon at the first position being 
any one of twenty possible codons (i.e., random) and the 
codon at the second position being one of the twenty 

30 possible codons. 

For randomization of the codon at the second position 
and synthesis of the third position codon, steps 2 through 
4 are again repeated. This process yields in each vessel 
a three codon oligonucleotide (i.e., 9 nucleotides) with 
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codon positions 1 and 2 randor^ized and position three 
containing one of the twenty possible codons. Steps 2 
through 4 are repeated to randomize the third position 
codon and synthesize the codon at the next position. The 

5 process is continued until an oligonucleotide of the 
desired length is achieved. After the final randomization 
step, the oligonucleotide can be cleaved from the supports 
and isolated by methods known to one skilled in the art. 
Alternatively, the oligonucleotides can remain on the 

0 supports for use in methods employing probe hybridization. 

The diversity of codon sequences, i.e., the number of 
different possible oligonucleotides, which can be obtained 
using the methods of the present invention, is extremely 
large and only limited by the physical characteristics of 
5 available materials. For example, a support composed of 
beads of about 100 in diameter will be limited to about 
10 000 beads/reaction vessel using a 1 /^M reaction vessel 
containing 25 mg of be^Hs. This size bead can support 
about 1 X lo' oligonucleotides per bead. Synthesis using 
20 separate reaction vessels for each of the twenty amino 
acids will produce beads in which all the oligonucleotides 
attached to an individual bead are identical. The 
diversity which can be obtained under these conditions is 
approximately 10^ copies of 10,000 x 20 or 200,000 different 
25 random oligonucleotides. The diversity can be increased, 
however, in several ways without departing from the basic 
methods disclosed herein. For example, the number of 
possible sequences can be increased by decreasing the size 
of the individual beads which make up the support. A bead 
30 of about 30 pm in diameter will increase the number of 
beads per reaction vessel and therefore the number ^of 
oligonucleotides synthesized. Another way to increase the 
diversity of oligonucleotides with random codons is to 
increase the volume of the reaction vessel. For example, 
3 5 using the same size bead, a larger volume can contain a 
greater number of beads than a smaller vessel and therefore 
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support the synthesis of a greater number of 
oligonucleotides. Increasing the n'onber of codons coupled 
. to a support in a single reaction vessel also increases the 
diversity of the randon oligonucleotides. The total 
5 diversity will be the number of codons coupled per vessel 
raised to the number of codon positions synthesized. For 
example, using ten reaction vessels, each synthesizing two 
codons to randomize a total of twenty codons, the number of 
different oligonucleotides of ten codons in length per 100 
10 Mm bead can be increased where each bead will contain about 
2^° or 1 X 10^ different sequences instead of one. One 
skilled in the art will know how to modify such parameters 
to increase the diversity of oligonucleotides with random 
codons . 
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A method of synthesizing oligonucleotides having 
random codons at each position using individual monomers 
wherein the number of reaction vessels is less than the 
number of codons to be randomized is also described. For 
example, if twenty codons are to be randomized at each 
position within an oligonucleotide population, then ten 
reaction vessels can be used. The use of a smaller number 
of reaction vessels than the number of codons to be 
randomized at each position is preferred because the 
smaller number of reaction vessels is easier to manipulate 
2 5 and results in a greater number of possible 
oligonucleotides synthesized. 

The use of a smaller number of reaction vessels for 
random synthesis of twenty codons at a desired position 
within an oligonucleotide is similar to that described 
above using twenty reaction vessels except that each 
reaction vessel can contain the synthesis products of more 
than one codon. For example, step one synthesis using ten 
reaction vessels proceeds by coupling about two different 
codons on supports contained in each of ten reaction 
vessels. This is shown in Figure 2 where each of the two 
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to a different support can consist of the 
codons coupled to 

tollowin. -*^-'="-^^;/ 3 ;,;o,^T for Tyr and His; (4, 
(T/C,CT for Ser and Pro 3 / ^ 
'-'r^r,T for Cys and Arg; (5) (C/A)it. 

^vture of the monomers indicated on each side 
\Tre used as if they were a single .ono.er xn the 
slash are used ^ntisense sequence for each 

10 indicated coupling step. synthesizing the 

^ 4.>,^ =,hnve codons can be generatea oy :=>y 
of the above coa ^^ample the antisense for Phe 

complementary sequence. For — P^^' 

X- The ammo acids encoutiu 

andVal canbe AA(C/A). The ^^^^^^^ 

the above pairs of sequences are given 
15 three letter nomenclature. 

coupling of the Bonders in this fash.cn will yield 
.ons specifying all twenty of the naturally occurrxng 
codons speciiyi. y ^ „ ten reaction vessels, 

amino acids attached to support. „^^^pis to be 

However, the number of individual reaction 

^ will depend on the number of codons to be r 
20 used will depena determined by one 

at the desired position and can 
Skilled in the art. For example, if ten c 

• ^ five reaction vessels can be 

randomized, then five r 

ina The codon sequences given above can be u 
coupling. The ^^^^^^ 

2 5 this synthesis as weii ^^^laced by any of the 

,l.o be Changed to °' ^ll^Zt^' ,.n..ic 

additional forty-four codons wh.cn constitutes 
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re.aining steps of s^^^ '::::^"T:::^ 

with «ndo. codons -^-^ Lthesis with twenty 

"\Is°«:rth:::h;;ixi;g and dividing steps 
reaction vessels excepi^ t-x number of 

* Tjith supports from about half tne 

are performed with supp 

reaction vessels. These remaining steps 
Figure 2 (steps 2 through 4). 
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Oligonucleotides having at least one specified tuplet 
at a predetermined position and the remaining positions 
. having random tuplets can also be synthesized using the 
methods described herein. The synthesis steps are simxlar 
5 to those outlined above using twenty or less reaction 
vessels except that prior to synthesis of the specified 
codon position, the dividing of the supports into separate 
reaction vessels for synthesis of different codons is 
omitted. For example, if the codon at the second position 
10 of the oligonucleotide is to be specified, then following 
synthesis of random codons at the first position and mixing 
of the supports, the mixed supports are not divided into 
new reaction vessels but, instead, can be contained in a 
single reaction vessel to synthesize the specified codon. 
15 The specified codon is synthesized sequentially fro= 
individual monomers as described above. Thus, the number 
of reaction vessels can be increased or decreased at each 
step to allow for the synthesis of a specified codon or a 
desired number of random codons. 

20 Following codon synthesis, the mixed supports are 

divided into individual reaction vessels for synthesis of 
the next codon to be randomized (Figure 1, step 3) or can 
be used without separation for synthesis of a consecutive 
specified codon- The rounds of synthesis can be repeated 

25 for each codon to be added until the desired number of 
positions with predetermined or randomized codons are 
obtained. 

synthesis of oligonucleotides with the first position 
codon being specified can also be synthesized using the 

30 above method. In this case, the first position codon is 
synthesized from the appropriate monomers. The supports 
are divided into the required number of reaction vessels 
needed for synthesis of random codons at the second 
position and the rounds of synthesis, mixing and dividing 

3 5 are performed as described above. 
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A method of synthesizing oligonucleotides having 
tuplets which are diverse but biased toward a predetermined 
sequence is also described herein. This nethod employs two 
reaction vessels, one vessel for the synthesis of a 
predetermined sequence and the second vessel for the 
synthesis of a random sequence. This method is 
advantageous to use when a significant number of codon 
positions, for example, are to be of a specified sequence 
since it alleviates the use of multiple reaction vessels. 
Instead, a mixture of four different monomers such as 
adenine, guanine, cytosine and thymine nucleotides are used 
for the first and second monomers in the codon. The codon 
is completed by coupling a mixture of a pair of monomers of 
either guanine and thymine or cytosine and adenine 
nucleotides at the third monomer position. In the second 
vessel, nucleotide monomers are coupled sequentially to 
yield the predetermined codon sequence. Mixing of the two 
supports yields a population of oligonucleotides containing 
both the predetermined c.aon and the random codons at the 
desired position. Synthesis can proceed by using this 
mixture of supports in a single reaction vessel, for 
example, for coupling additional predetermined codons or, 
further dividing the mixture into two reaction vessels for 
synthesis of additional random codons. 

The two reaction vessel method can be used for codon 
synthesis within an oligonucleotide with a predetermined 
tuplet sequence by dividing the support mixture into two 
portions at the desired codon position to be randomized. 
Additionally, this method allows for the extent of 
randomization to be adjusted. For example, unequal mixing 
or dividing of the two supports will change the fraction of 
codons with predetermined sequences compared to those with 
random codons at the desired position. Unequal mixing and 
dividing of supports can be useful when there is a need to 
synthesize random codons at a significant number of 
positions within an oligonucleotide of a longer or shorter 
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The extent of randomization can also be adjusted by 
■ using unequal ^.ixtures of Bono:ners in the first, second ana 
third .ono.er coupling steps of the random codon posxtxon. 
The unequal fixtures can be in any or all of the coupl^g 
steps to yield a population of codons enriched in sequences 
reflective of the monomer proportions. 

synthesis of randomized oligonucleotides is perfonned 
using methods veil known to one skilled in the art. Lxnear 
coupling of monomers can, for example, be accomplished 
using phosphoranidite chemistry with a MilliGen/Biosearcn 
cyclone Plus automated synthesizer as described by the 
rxanufacturer (Millipore, Burlington, MA). Other 
chemistries and automated synthesizers can be employed as 
well and are known to one skilled in the art. 

synthesis of multiple codons can be perfonned without 
.edification to the synthesizer by separately synthesizing 
the codons in individual sets of reactions. Alternatively, 
modification of an automated DNA synthesizer can be 
performed for the simultaneous synthesis of codons in 
multiple reaction vessels. 

in one embodiment, the invention provides a plurality 
of procaryotic cells containing a diverse population of 
expressible oligonucleotides operationally linked to 

25 expression elements. the expressible oligonucleotides 
having a desirable bias of random codon sequences produced 
from diverse combinations of first and second 
Oligonucleotides having a desirable bias 
sequences. The invention provides for a method or 

30 constructing such a plurality of procaryotic cells as well. 

The oligonucleotides synthesized by the above methods 
can be used to express a plurality of random peptides which 
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are unbiased, diverse ^ut biased toward a predeternmea 
sequence or which contain at least one specified codon at 
a predetermined position. The need will deternine which 
■ type of oligonucleotide is to be expressed to give tne 
5 resultant population of random peptides and is kno'wn to one 
skilled in the art. Expression can be performed in any 
compatible vector/host system. Such systems include, for 
example, plasmids or phagemids in procaryotes such as E. 
coii. yeast systems, and other eucaryotic systems such as 

10 mammalian cells, but will be described herein in context 
with its presently preferred embodiment, i.e. expression on 
the surface of filamentous bacteriophage. Filamentous 
bacteriophage can be, for example, M13, fl and fd. Such 
phage have circular single-stranded genomes and double 

15 strand replicative DNA forms. Additionally, the peptides 
can also be expressed in soluble or secreted form depending 
on the need and the vector/host system employed. 

Expression of random peptides on the surface of M13 
can be accomplished, for example, using the vector system 
20 Shown in Figure 3. Construction of the vectors enabling 
one of ordinary skill to make them are explicitly set out 
in Examples I and II. The complete nucleotide sequences 
are given in Figures 5, 6 and 7 (SEQ ID NOS: 1, 2 and 3, 
respectively). This system produces random 

25 oligonucleotides functionally linked to expression elements 
and to gVIII by combining two smaller oligonucleotide 
portions contained in separate vectors into a single 
vector. The diversity of oligonucleotide species obtained 
by this system or others described herein can be 5 x 10 or 
3 0 greater. Diversity of less than 5 x 10 can also be 
obtained and will be determined by the need and type of 
random peptides to be expressed. The random combination of 
two precursor portions into a larger oligonucleotide 
increases the diversity of the population several fold and 
35 has the added advantage of producing oligonucleotides 
larger than what can be synthesized by standard methods. 
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Additionally, although the correlation is not known, when 
the number of possible paths an cligonucleotide can take 
during synthesis such as described herein is greater than 
the number of beads, then there will be a correlation 
5 between the synthesis path and the sequences obtained. By 
combining oligonucleotide populations which are synthesized 
separately, this correlation will be destroyed. Therefore, 
any bias which may be inherent in the synthesis procedures 
will be alleviated by joining two precursor portions into 
10 a contiguous random oligonucleotide. 

Populations of precursor oligonucleotides to be 
combined into an expressible form are each cloned into 
separate vectors. The two precursor portions which make up 
the combined oligonucleotide corresponds to the carboxy and 

15 amino terminal portions of the expressed peptide. Each 
precursor oligonucleotide can encode either the sense or 
anti-sense and will depend on the orientation of the 
expression elements and the gene encoding the fusion 
portion of the protein as well as the mechanism used to 

20 join the two precursor oligonucleotides. For the vectors 
shown in Figure 3, precursor oligonucleotides corresponding 
to the carboxy terminal portion of the peptide encode the 
sense strand. Those corresponding to the amino terminal 
portion encode the anti-sense strand. Oligonucleotide 

2 5 populations are inserted between the Eco RI and Sac I 
restriction enzyme sites in K13IX22 and M13IX42 (Figure 3A 
and B) . M13IX42 (SEQ ID NO: 1) is the vector used for 
sense strand precursor oligonucleotide portions and M13IX22 
(SEQ ID no: 2) is used for anti-sense precursor portions. 

30 The populations of randomized oligonucleotides 

inserted into the vectors are synthesized with Eco RI and 
sac I recognition sequences flanking opposite ends of the 
random codon sequences. The sites allow annealing and 
ligation of these single strand oligonucleotides into a 

3 5 double stranded vector restricted with Eco RI and Sac I. 
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Alternatively, the oligonucleotides can be inserted intc 
the vector by standard mutagenesis methods. In this latter 
method, single stranded vector DNA is isolated from the 
phage and annealed with randoin oligonucleotides having 
5 known sequences complementary to vector sequences. Th£ 
oligonucleotides are extended with DNA polymerase zo 
produce double stranded vectors containing the randomized 
oligonucleotides . 

The vector used for sense strand oligonucleotide 
10 portions, M13IX42 (Figure 3B) contains down-stream and in 
frame with the Eco RI and Sac I restriction sites a 
sequence encoding the pseudo-wild type gVIII product. This 
gene encodes the wild type M13 gVIII amino acid sequence 
but has been changed at the nucleotide level to reduce 
15 homologous recombination with the wild type gVIII contained 
on the same vector. The wild type gVIII is present to 
ensure that at least some functional, non-fusion coat 
protein will be produced. The inclusion of a wild type 
gVTII therefore reduces the possibility of non-viable phage 
2 0 production and biological selection against certain peptide 
fusion proteins. Differential regulation of the two genes 
can also be used to control the relative ratio of the 
pseudo and wild type proteins. 

Also contained downstream and in frame with the Eco RI 
25 and sac I restriction sites is an amber stop codon. The 
mutation is located six codons downstream from Sac I and 
therefore lies between the inserted oligonucleotides and 
the gVIII sequence. As was the function of the wild type 
gVIII, the amber stop codon also reduces biological 
3 0 selection when combining precursor portions to produce 
expressible oligonucleotides. This is accomplished by 
using a non-suppressor (sup 0) host strain because non- 
suppressor strains will terminate expression after the 
oligonucleotide sequences but before the pseudo gVIII 
35 sequences. Therefore, the pseudo gVIII will never be 



PCT/LS91/07141 

WO 92/06176 



10 



15 



20 



25 



30 



21 



35 



expressed on the phage surface under these circu^ustances. 
instead, only soluhle peptides will be produce . 
E^cpression in a non-suppressor strain can be advantageously 
utilized When one wishes to produce large populations of 
soluble peptides. Stop codons other than a^er, such as 
opal and ochre, or molecular switches, such as inducible 
repressor elements, can also be used to unlink peptide 
expression from surface e=<pression. Additional controls 
exist as well and are described below. 

The vector used for anti-sense strand oligonucleotide 
portions, M13IX22, (Figure 3A) , contains the expression 
ele-ents for the peptide fusion proteins. Upstream and in 
frame with the sac I and Eco RI sites in this vector is a 
leader secruence for surface expression. A ribosome binding 
site and Lao Z promoter/operator elements are present for 
transcription and translation of the peptide fusion 
proteins . 

Both vectors contain a pair of Fok I restriction 
en^y^e sites (Fi^re 3 A and B) for joining together two 
precursor oligonucleotide portions and their vector 
L-^uences. One site is located at the ends of each 
precursor oligonucleotide which is to be joined Tb. 
second Fo. I site within the vectors is located at t^e end 
of the vector seguences which are to be joined The 5 
overhang of this second Fok I site has been altered 
encode a sequence which is not found in the overhangs 
produced at tte first Fok X site within the oligonucleotide 
portions. The two sites allow the cleavage of each 
circular vector into two portions and subsequent ligation 
of essential components within each vector into a single 
circular vector where the two oligonucleotide precursor 
portions form a contiguous sequence (Figure 3C) . Hon- 
: rpatible overhangs produced at the two Fok 1 ""s allows 
optical conditions to be selected for perfcmih, 
cLcatemization or circularization reactions for joining 
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the two vector portions. Such selection of conditions can 
be used to govern the reaction order and therefore increase 
the efficiency of joining. 

Fok I is a restriction enzyme whose recognition 
5 sequence is distal to the point of cleavage. Distal 
placement of the recognition sequence in its location to 
the cleavage point is important since if the two were 
superimposed within the oligonucleotide portions to be 
combined, it would lead to an invariant codon sequence at 

10 the juncture. To alleviate the formation of invariant 
codons at the juncture, Fok I recognition sequences can be 
placed outside of the random codon sequence and still be 
used to restrict within the random sequence. Subsequent 
annealing of the single-strand overhangs produced by Fok I 

15 and ligation of the two oligonucleotide precursor portions 
allows the juncture to be formed. A variety of restriction 
enzymes restrict DNA by this mechanism and can be used 
instead of Fok I to join precursor oligonucleotides without 
creating invariant codon sequences. Such enzymes include, 

20 for example, Alw I, Bbu I, Bsp MI, Hga I, Hph I, Mbo II, 
Mnl I, Pie I and Sfa NI. One skilled in the art knows how 
to substitute Fok I recognition sequences for alternative 
enzyme recognition sequences such as those above, and use 
the appropriate enzyme for joining precursor 

25 oligonucleotide portions. 

Although the sequences of the precursor 
oligonucleotides are random and will invariably have 
oligonucleotides within the two precursor populations whose 
sequences are sufficiently complementary to anneal after 
.......... the efficiency of annealing can be increased by 

Insuring' that the single-strand overhangs within one 
precursor population will have a complementary sequence 
within the second precursor population. This can be 
accomplished by synthesizing a non-degenerate series of 
35 known sequences at the Fok I cleavage site coding for each 
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of the twenty anino acids. Since the Fok I cleavage site 
contains a four base overhang, forty different sequences 
. are needed to randomly encode all twenty amino acids. For 
exaiapie, if two precursor populations of ten codon^ in 
5 length are to be coiabined, then after the ninth codon 
position is synthesized, the mixed population of supports 
are divided into forty reaction vessels for each of the 
populations and complementary sequences for each of the 
corresponding reaction vessels between populations are 
10 independently synthesized. The sequences are shown in 
Tables III and VI of Example I where the oligonucleotides 
on columns IR through 4 OR form complementary overhangs with 
the oligonucleotides on the corresponding columns IL 
through 4 0L once cleaved. The degenerate X positions in 
15 Table. VI are necessary to maintain the reading frame once 
the precursor oligonucleotide portions are joined. 
However, use of restriction enzymes which produce a blunt 
end, such as Mnl I can be alternatively used in place of 
Fok I to alleviate the degeneracy introduced in maintaining 
20 the reading frame. 

The last feature exhibited by each of the vectors is 
an amber stop codon located in an essential coding sequence 
within the vector portion lost during combining (Figure 
3C) . The amber stop codon is present to select for viable 
2 5 phage produced from only the proper combination of 
precursor oligonucleotides and their vector sequences into 
a single vector species. Other non-sense mutations or 
selectable markers can work as well. 

The combining step randomly brings together different 
30 precursor oligonucleotides within the two populations into 
a single vector (Figure 3C; M13IX) . The vector sequences 
donated from each independent vector, M13IX22 and M13IX42, 
are necessary for production of viable phage. Also, since 
the expression elements are contained in M13IX22 and the 
35 gVIII sequences are contained in M13IX42, expression of 



WO 92/06176 



10 



15 



20 



24 



ac=complished until the sequences are linked 
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The co^ining step is performed by restricting each 
of vectors containing randomized 

::r:rcTeotide:.ith .o. ...^^^ - 

!c> Any vectors generated which contain an a^er stop 

ion will not produce viahle phage when i"'— ^ 
non-suppressor strain (Fl^re 3D). Therefore, only the 

:;en:L which do not contain an a^er . " 

.a^ up the final population of vectors contained in the 
ILrary. These vector se.^ences are the se,^ences required 
tor surface expression of randomized peptides. By 
analogous methodology, .ore than two vector portions can he 
combined into a single vector which expresses random 
peptides - 

The invention provid.,- for a method of selecting 
peptides capable of being bound by a ligand binding protein 
from a population of random peptides by (a, 
unking a diverse population of first oligonucleotides 
Z^:. desirable bias of random .-'^--^ J/^/^ 

first vector, (b, operationally linking ^ 
population Of second oUgonucleotides hav^^ng a de^ira^^e 
hi as of random codon sequences to a secoi 
raining the vector products of steps - ----- 

conditions where said populations of first ana 
Oligonucleotides are joined together into a population o 

fomlined vectors: ,d> J^undT rn-ioi: 

combined vectors into a compatible host un 

, sufficient for expressing said P°P"1^-- ^ 

peptides, and Ce, — ^^^'^^ P o ide 
said binding protein. The invention also prov 
determining the encoding nucleic acid sequence of such 
peptides as well. 
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surface expression of the randon peptide ^^^"^ 
performed in an anbe. suppressor strain. As described 
above the a^er stop codon between the rando. codon 
r^:;oe and the ,«XI se^enoe unlin.s the two components 
n non-suppressor strain. Isolating the phage produced 
o. the non-suppressor strain and infecting a suppressor 

..ain wiii Un. the rando. ^^^--^---^LrZ 
sequence during expression (Figure JtJ 

su^ressor strain after infection allows the -'^P"-"" ^ 
aU peptide species within the library as ^VlH-peptide 

usion proteins. Mternatively. the ONA can be isolate 
fro. the non-suppressor strain and then introduced into 
suppressor strain to accomplish the sane effect. 

The level of expression of gVIII-peptide fusion 
p„teins can additionally be controlled at the 
transcriptional level. The gVIII-peptide fusion proteins 
are under the inducible control of the Lac . 

loter/operator systen. other inducible promoters c 
L>c as well and are ^own by one sWlled in the art For 

high levels of surface ^ rp""r l^Z 

is cultured in an inducer of the Lac p 
isopropylthio-B-galactoside (IPTG) . Inducible control 
neficial because biological selection against non- 
ctional gVXXX-peptide fusion proteins can - 
by culturing the library under non-expressing conditions 
Expression can then be induced only at the ti.e c 
screening to ensure that the entire ^'^'^'^^^^.^^ 
oligonucleotides within the library are 
represented on the phage surface. Also this can be used to 
control the valency of the peptide on the phage surface. 

The surface expression library is screened for 
specific peptides which bind ligand 
standard affinity isolation ^-^^^^-^^J^^ 
include, for example, panning, affinity ^^l..^ 
solid phase blotting procedures. Panning as described by 
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P.rBley and Snith, =ene ,3:305-318 (1988), -hich - 
incrpcrated herein by reference, is preferred because M,h 
titers of phage can be screened easily, quicWy and in 
- ,„ali vcim.es. FurthenBcre, this procedure can select 
5 ninor peptide species .ithin the population wMch 
otherwise would have been urdetectable, and -P^^*"^ '° 
substantially ho:.ogencus populations. The selected peptide 
seduences can be determined by sequencing the nucleic acid 
encoding such peptides after amplification of the phage 
10 population. 

The invention provides a plurality of procaryotic 
cells containing a diverse population of oligonucleotides 
having a desirable bias of random codon se^ences that are 
operationally linked to expression sequences. The 
15 invention provides for nethods of constructing such 
populations of cells as well. 

Random oligonucleotides synthesized by any of the 
methods described previously can also be expressed on the 
surface of filamentous bacteriophage, such as M13, 

20 example, without the joining together of precursor 
Oligonucleotides. A vector such as that shown in PigureJ^, 
HI3IX30, can be used. This vector exhibits all the 
functional features of the combined vector shown xn Figure 
3C for surface expression of gVIII-peptide fusion proteins 

.5 The complete nucleotide sequence for M13IX30 (SEQ ID HO: 3) 
is shown in Figure 7. 

M13IX30 contains a wild type gVIII for phage viability 
and a pseudo gVIII sequence for peptide fusions. The 
Tector !lso contains in frame restriction sites for cloning 

The cloning sites in this vector are Xho 

I stu I and spe I. Oligonucleotides should therefore be 
synthesized with the appropriate complementary ends for 
annealing and ligation or insertional mutagenesis 
: ternatlely, the appropriate temini can be generated by 



wo 92/06176 



pCr/LS91/07141 



27 



PGR technology. Between the restriction sites and the 
pseudo gVIII sequence is an in-frane anber stop codon, 
again, ensuring complete viability of phage in constructing 
and manipulating the library. Expression and screening is 
5 performed as described above for the surface expression 
library of oligonucleotides generated from precursor 
portions . 

Thus, the invention provides a method of selecting 
peptides capable of being bound by a ligand binding protein 

10 from a population of random peptides by (a) operationally 
linking a diverse population of oligonucleotides having a 
desirable bias of random codon sequences to expression 
elements; (b) introducing said population of vectors into 
a compatible host under conditions sufficient for 

15 expressing said population of random peptides; and (c) 
determining the peptides which bind to said binding 
protein. Also provided is a method for determining the 
encoding nucleic acid sequence of such selected peptides. 

The following examples are intended to illustrate, but 
2 0 not limit the invention. 

EXAMPLE I 

T^^in^nnn ;,nd char arteri zation nf Peptide T.iqands Generated 
FT-nm Piaht and Left Ha^ f Pandom Oligonucleotides 

2 5 This example shows the synthesis of random 

oligonucleotides and the construction and expression of 
surface expression libraries of the encoded randomized 
peptides. The random peptides of this example derive from 
the mixing and joining together of two random 

3 0 oligonucleotides. Also demonstrated is the isolation and 

characterization of peptide ligands and their corresponding 
nucleotide sequence for specific binding proteins. 
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^■F T?andni-\ m -i gonu cleotides 

• correspond to s.aU.r po.-..cns or ^ ar. ^^^^^^^ 
oligonucleotide is J'^^-';^ ^^^^^^ .U,onacleotide, 

5 portions .a.e up one-half of ^^^J ' constituting 
,ne population of randomized "^^^onu^-t ^^^^ 
V, are designated the right and lert 

irttn and left nalves are -n codcns .n 

L:,t. -it. t„ent. ranao. oodons ^J^^^^^^ ^ 
10 right half corresponds to the J ,^,1 ■ 

randomized oligonucleotides and ^''^^ ^^^ ZTlr.s^na. 
half of the expressed peptides. The le 
to the anti-sense sequence of tn 
Iligonucleotides and encode the a„ino ^^^'t^Hl 
.3 expressed peptides. The cloned into 

randomized oligonucleotide populations are 

separate Vector species and then -"^ ^^r^^ 

.he right and left J ^ctor species 

combination to produce a single express! 
=0 contains a population - ^rtt vector 

ro::ratr::t:"a:^rptire host p-^. .1— 

phage Which express the random peptides on their surfac 

for- oliaonucleotide synthesis 
The reaction vessels for olxgon ^^^^^^ed 
obtained from the manufacturer of the 
2 5 were obtamea rrou , . MA- supplier of 

rw-ininore Burlington, ma, ^"Vf 

-rire^B— x-ne .us s.^^^j^-:z 

_supplieda ac. -.^^^^^^^ 

(1 fimole), frxts, crimps . \. and underivatized 
,„ catalog # OEK seCSS, . Derivat ^,,^,,,.3, and 

control pore glass, phosphcramiu ^^^^^^^ 
synthesis reagents were 1 ^^^^^^^^^ ^^^^^ ^^^^ 
HilliGen/Biosearoh. Crimper Pittsburgh, PA 

obtained from Fisher --""^7^„^=.;-3;, ...p.ctively, . 
35 (Catalog numbers 06-406-20 and 06-406 
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Ten reaction colunns were used for right half 
synthesis of random: oligonucleotides ten codons in length. 
The oligonucleotides have 5 monomers at their 3* end of the 
sequence 5'GAGCT3' and S monomers at their 5' end of the 
sequence 5 ' AATTCCAT3 ' . The synthesizer was fitted with a 
column derivatized with a thymine nucleotide (T-column, 
MilliGen/Biosearch # 0615.50) and was prograimed to 
synthesize the sequences shown in Table I for each of ten 
columns in independent reaction sets. The sequence of the 
last three monomers (from right to left since synthesis 
proceeds 3 • to 5 ' ) encode the indicated amino acids: 



Table I 



15 



20 



Colupn 

column IR 
column 2R 
column 3R 
column 4R 
column 5R 
column 6R 
column 7R 
column 8R 
column 9R 
column IR 



Sequence 
r5 ' to 3 ' ) 

(T/G)TTGAGCT 
(T/C) CTGAGCT 
(T/C) ATGAGCT 
(T/C) GTGAGCT 
(C/A)TGGAGCT 
(C/G) AGGAGCT 
(A/G) CTGAGCT 
( A/ G) ATGAGCT 
(T/G)GGGAGCT 
A(T/A) AGAGCT 



ftmino Acids 



Phe 
Ser 
Tyr 
Cys 
Leu 
Gin 
Thr 
Asn 
Trp 
He 



and Val 
and Pro 
and His 
and Arg 
and Met 
and Glu 
and Ala 
and Asp 
and Gly 
and Cys 



25 where the two monomers in parentheses denote a single 
monomer position within the codon and indicate that an 
equal mixture of each monomer was added to the reaction for 
coupling. The monomer coupling reactions for each of the 
10 columns were performed as recommended by the 

30 manufacturer (amidite version SI. 06, # 8400-050990, scale 
1 MM). After the last coupling reaction, the columns were 
vashed with acetonitrile and lyophilized to dryness. 

Following synthesis, the plugs were removed from each 
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colunn using a decri.per and the reaction products were 
poured into a single weigh boat. Initially the bead nass 
increases, due to the weight of the nono:ners, however, at 
• later rounds of synthesis naterial is lost. In eitt^er 
5 case the material was equalized with underivatized control 
pore' glass and mixed thoroughly to obtain a randon 
distribution of all twenty codon species. The reaction 
products were then aliquotted into 10 new reaction colu:-s 
by reinoving 2 5 :ng of material at a tir.e and placing it xnto 
10 separate reaction colmrms. Alternatively, the reaction 
products can be aliquotted by suspending the beads m a 
liquid that is dense enough for the beads to rer:ain 
dispersed, preferably a liquid that is equal in density to 
the beads, and then aliquoting eq-^al vol>i:.es of the 
suspension into separate reaction colui^s. The lip on the 
inside of the colurms where the frits rest was clearea of 
material using vacuum suction with a syringe and 2o G 
needle. New frits were placed onto the lips, the plugs 
were fitted into the columns and were crimped into place 
2 0 using a crimper. 

synthesis of the second codon position was achieved 
using the above 10 columns containing the random mixture of 
reaction products from the first codon synthesis. The 
monomer coupling reactions for the second codon position 
25 are shown in Table II. An ^ in the first position means 
that any monomer can be programmed into the synthesizer. 
At that position, the first monomer position is not coupled 
by the synthesizer since the software assumes that the 
monomer is already attached to the column. An A also 
30 denotes that the columns from the previous codon synthesis 
should be placed on the synthesizer for use in the present 
synthesis round. Reactions were again sequentially 
repeated for each column as shown in Table II and the 
reaction products washed and dried as described above. 
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Colunn 




(5* to 3 ' ) 


Amino Acids 


colunn 


IR 


(T/G)TTA 


Phe 


and 


Val 


colunn 


2R 


(T/C)CTA 


Ser 


and 


Pro 


column 


3R 


(T/C)ATA 


Tyr 


and 


His 


coliimn 


4R 


(T/C) GTA 


Cys 


and 


Arg 


column 


5R 


(C/A)TGA 


Leu 


and 


Met 


column 


6R 


(C/G)AGA 


Gin 


and 


GlU 


column 


7R 


(A/G)CT^ 


Thr 


and 


Ala 


column 


8R 


(A/G)AT^ 


Asn 


and 


Asp 


column 


9R 


(T/G)GGA 


Trp 


and 


Gly 


column 


lOR 


A(T/A) A^ 


He 


and 


cys 



Randomization of the second codon position was achieved by 
15 removing the reaction products from each of the columns and 
thoroughly mixing the material. The material was again 
divided into new reaction columns and prepared for monomer 
coupling reactions as described above. 



20 



Random synthesis of the next seven codons (positions 
3 through 9) proceeded identically to the cycle described 
above for the second codon position and again used the 
monorier sequences of Table II. Each of the newly repacked 
columns containing the random mixture of reaction products 
from synthesis of the previous codon position was used for 
25 the synthesis of the subsequent codon position. After 
synthesis of the codon at position nine and mixing of the 
reaction products, the material was divided and repacked 
into 40 different columns and the monomer sequences sho--Ti 
in Table III were coupled to each of the 40 columns in 
independent reactions. The oligonucleotides from each of 
the 40 columns were mixed once more and cleaved from the 
control pore glass as recommended by the manufacturer. 



30 
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Table III 



10 



15 



20 



25 



30 



35 



Column 
column IR 
column 2R 
column 3R 
column 4R 
column 5R 
column 6R 
column 7R 
column 8R 
column 9R 
column lOR 
column IIR 
column 12R 
column 13R 
column 14R 
column 15R 
column 16R 
column 17R 
column 18R 
column 19R 
column 20R 
column 21R 
column 22R 
column 23R 
column 24R 
column 25R 
column 26R 
column 27R 
column 28R 
column 29K 
column 30R 
column 31R 
column 32R 
column 33R 



AATTCTTTTA 
AATTCTGriA 
AATTCGTTTA 
AATTCGGTTA 
AATTCTTCT^ 
AATTCTCCTA 
AATTCGTCTA 
AATTCGCCT^ 
AATTCTTATA 
AATTCTCAT^ 
AATTCGTATA 
AATTCGCATA 

AATTCTTGTA 
AATTCTCGTA 
AATTCGTGTA 

AATTCGCGTA 
AATTCTCTGA 
AATTCTATG^ 
AATTCGCTG^ 
AATTCGATGA 
AATTCTCAG^ 
AATTCTGAGA 
AATTCGCAGA 

AATTCGGAGIk 
AATTCTACTA 

AATTCTGCT2k 

AATTCGACT^ 

AATTCGGCT^ 

AATTCTAAT^ 

AATTCTGAT^ 

AATTCGAAT^ 

AATTCGGATA 

AATTCTTGGA 
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colunn 34R 
colunn 35R 
column 36R 
column 37R 
column 38R 
column 39R 
column 40R 



AATTCTGGG^i 
AATTCGTGGA 
AATTCGGGGA 
AATTCTATAii 
AATTCTAAA^ 
AATTCGATAA 
AATT CG AAAA 



Left half synthesis of randon oligonucleotides 
proceeded similarly to the right half synthesis. This half 

10 of the oligonucleotide corresponds to the arti-sense 
sequence of the encoded randomized peptides. Thus, the 
complementary sequence of the codons in Tables I through 
III are synthesized. The left half oligonucleotides also 
have 5 monomers at their 3' end of the sequence 5'GAGCT3' 

15 and 8 monomers at their 5- end of the sequence 
5'AATTCCAT3' . The rounds of synthesis, washing, drying, 
mixing, and dividing are as described above. 

For the first codon position, the synthesizer was 
fitted with a T-column and programmed to synthesize the 
2 0 sequences shown in Table IV for each of ten columns in 
independent reaction sets. As with right half synthesis, 
the sequence of the last three monomers (from right to 
left) encode the indicated amino acids: 
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Table IV 



10 







Sequence 








Column 




, 5 ' uO J 1. 


Aimno Acids 


coluirm 


IL 


AA (A/C) G^OCi 


PhG 


and 


Val 


column 


2L 


AG (A/G) GAGCi 


Ser 


and 


Pro 


column 


3L 


AT(A/G)GAGCT 


Tyr 


and 


His 


column 


4L 


AC(A/G)GAGCT 


Cys 


and 


Arg 


column 


5L 


CA(G/T)GAGCT 


Leu 


and 


Met 


column 


6L 


CT(G/C)GAGCT 


Gin 


ajid 


Glu 


column 


7L 


AG(T/C)GAGCT 


Thr 


and 


Ala 


column 


8L 


AT(T/C)GAGCT 


Asn 


and 


Asp 


column 


9L 


CC(A/C)GAGCT 


Trp 


and 


Gly 


column 


lOL 


T(A/T)TGAGCT 


He 


and 


Cys 



Following washing and drying, the plugs for each column 
15 were removed, mixed and aliquotted into ten new reaction 
columns as described above. Synthesis of the second codon 
position was achieved using these ten columns containing 
the random mixture of reaction products from the first 
codon synthesis. The monomer coupling reactions for the 
20 second codon position are shown in Table V. 



Table V 



25 



30 



Column 




Sequence 
^5' to 3M 


A]nino Acids 


column 


IL 


AA(A/C) A 


Phe and Val 


column 


2L 


AG (A/G) ^ 


Ser and Pro 


column 


3L 


AT (A/G) A 


Tyr and His 


column 


4L 


AC (A/G) A 


Cys and Arg 


column 


5L 


CA(G/T)A 


Leu and Met 


column 


6L 


CT(G/C) A 


^-1^- 3 

^jXII cUiU oxu. 


column 


7L 


AG(T/C) A 


Thr and Ala 


column 


8L 


AT(T/C) A 


Asn and Asp 


column 


9L 


CC(A/C) A 


Trp and Gly 


column 


lOL 


T(A/T)TA 


lie and Cys 
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10 



Again, randomization of the second codon position was 
achieved by removing the reaction products from each of the 
columns and thoroughly mixing the beads. The beads were 
repacked into ten new reaction columns. 

Random synthesis of the next seven codon positions 
proceeded identically to the cycle described above for the 
second codon position and again used the monomer sequences 
of Table V. After synthesis of the codon at position nine 
and mixing of the reaction products, the material was 
divided and repacked into 4 0 different columns and the 
monomer sequences shown in Table VI were coupled to each of 
the 4 0 columns in independent reactions. 



Table VI 



15 



20 



25 



30 



Column 




Q^Drpif^nrif;^ f5' to. 


column 


IL 


AATTCCATAAAAXXA 


column 


2L 


AATTCCATAAACXXA 


column 


3L 


AATTCCATAACAXXA 


column 


4L 


AATTCCATAACCXXA 


column 


5L 


AATTCCATAGAAXXA 


column 


6L 


AATTCCATAGACXXA 


column 


7L 


AATTCCATAGGAXXA 


column 


8L 


AATTCCATAGGCXXA 


column 


9L 


AATTCCATATAAXXA 


column 


lOL 


AATTCCATATACXXA 


column 


IIL 


AATTCCATATGAXXA 


column 


12L 


AATTCCATATGCXXA 


column 


13L 


AATTCCATAGAAXXA 


column 


14L 


AATTCCATACACXXA 


column 


15L 


AATTCCATAGGAXXA 


column 


16L 


AATTCCATAGGCXXA 


column 


17L 


AATTCCATCAGAXX^ 


colximn 


18L 


AATTCCATCAGCXXA 


column 


19L 


AATTCCATCATAXXA 


column 


20L 


AATTCCATCATCXXA 
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10 



15 



20 



C-O XLUllil 


0 IL 

£^ ^ i-i 


AATTCCATCTGAXXA 


f~\~\ 11 TTi n 
t_, tJ X U-liU I 


22L 


AATTCCATCTGCXXA 


C U±UJIiA 1 


2 3L 


AATTCCATCTGAXXA 


X LLiili i 


24L 


AATTCCATCT-CCXXA 


COX UJUIl 


^ 


AATTCCATAGTAXXA 


Til Tn T*i 

COlUluXl 


^ u xj 


AATTCCATAGTCXXA 


coluinn 


/ / Xi 


AATT C CAT AG CAXXA 


column 


Z O Xi 


AATTCCATAGCCXXA 


coxuiun 


^ i7 Xj 


AATT C C AT ATT AXX^ 


COXUmli 


«5 UXi 


AATTCCATATTCXXA 


coxuinn 


^ X Xj 


AATTCCATATCAXXA 


coxuinn 


^ ^ Xi 


AATTCCATATCCXXA 


column 


T T T 


AATTCCATCCAAXXA 


column 


'I T 

J 4 Xj 


a ATTCCATCCACXXA 


column 


J DXj 


A ATTCCATCCCAXXA 


column 


36L 




column 


37L 


AATTCCATTATAXX^^ 


column 


38L 


AATTCCATTATCXXA 


column 


39L 


AATTCCATTTTAXX^ 


column 


40L 


AATTCCATTTTCXX^ 



25 



The first two monomers denoted by an "X" represent an equal 
mixture of all four nucleotides at that position. This is 
necessary to retain a relatively unbiased codon sequence at 
the junction between right and left half oligonucleotides. 
The above right and left half random oligonucleotides were 
cleaved and purified from the supports and used m 
constructing the surface expression libraries below. 



30 



Vector r-nnstTuction 



TWO M13-based vectors, M13IX42 (SEQ ID NO: 1) and 

____ -rr, .T^. ->^ „oT-^ constructed for the cloning 
M13IXi;/ ^ar.y xi^ '■^^ • ^ii 

and propagation of right and left half populations of 
random oligonucleotides, respectively. The vectors were 
specially constructed to facilitate the random joining and 
subsequent expression of right and left half 
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oligonucleotide populations. Each vector within the 
population contains one right and one left half 
oligonucleotide from the population joined together to forr: 
a single contig-aous oligonucleotide with random codons 
5 which is twenty-two codons in length. The resultant 
population of vectors are used to construct a surface 
expression library' . 

M13IX42, or the right-half vector, was constructed to 
harbor the right half populations of randonized 

10 oligonucleotides. M13inpl8 (Pharmacia, Piscataway NJ) was 
the starting vector. This vector was genetically modified 
to contain, in addition to the encoded wild type M13 gene 
VIII already present in the vector: (D a pseudo-wild type 
M13 gene VIII sequence with a stop codon (amber) placed 

15 between it and an Eco Rl-Sac I cloning site for rando:=ized 
oligonucleotides; (2) a pair of Fok I sites to be used for 
joining with M13IX22, the left-half vector; (3) a second 
amber stop codon placed on the opposite side of the vector 
than the portion being combined with the left-half vector; 

20 and (4) various other mutations to remove redundant 
restriction sites and the amino terminal portion of Lac Z . 

The pseudo-wild type M13 gene VIII was used for 
surface expression of random peptides. The pseudo-wild 
type gene encodes the identical amino acid sequence as that 
25 of the wild type gene; however, the nucleotide sequence has 
been altered so that only 63% identity exists between this 
gene and the encoded wild type gene VIII. Modification of 
the gene VIII nucleotide sequence used for surface 
expression reduces the possibility of homologous 

30 recombination with the wild type gene VIII contained on the 
same vector. Additionally, the wild type M13 gene VIII was 
retained in the vector system to ensure that at least some 
functional, non-fusion coat protein would be produced. The 
inclusion of wild type gene VIII therefore reduces the 

35 possibility of non-viable phage production from the random 
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peptide fusion genes. 

The pseudo-wild type gene VIII was constructed by 
chejnically synthesizing a series of oligonucleotides which 
encode both strands of the gene. The oligonucleotides are 
5 presented in Table VII (SEQ ID NOS : 7 through 16). 

TABLE VII 

pseudo-Wild Type Gene VIII Oligonu cleotide Series 



Top strand 
Oligonucleotides 



gequence fS ' to 3 ' ) 



10 



15 



VIII 03 



VIII 04 



VIII 05 



VIII 06 



VIII 07 



GATCC TAG GOT GAA GGC GAT 

GAG OCT GOT AAG GOT GC 

A TTC AAT AGT TTA GAG GCA 

AGT GOT ACT GAG TAG A 

TT GGC TAG GOT TGG GOT ATG 

GTA GTA GTT ATA GTT 

GGT GOT AGO ATA GGG ATT AAA 

TTA TTC AAA AAG TT 

T ACG AGC AAG GOT TCT TA 



20 



Bottom strand 
Oligonucleotides 



25 



VIII 08 



VIII 09 



VIII 10 



VIII 11 



VTII 12 



AGC TTA AGA AGC CTT GOT CGT 
AAA CTT TTT GAA TAA TTT 
AAT CCC TAT GGT AGC ACC AAC 
TAT AAC TAC TAG CAT 

AGC CCA AGC GTA GCC AAT GTA 

CTC AGT AGC ACT TG 

C CTG TAA ACT ATT GAA TGC 

AGC CTT AGC AGG GTC 

ATC GCC TTC AGC CTA G 



30 



Except for the terminal oligonucleotides VTII 03 (SEQ 
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no: 7) and VIII 08 (SEQ ID NO: 12), the above 
oligonucleotides (oligonucleotides VIII 04-VIII 07 and 09- 
12 (SEQ ID NOS: 8 through 11 and 13 through 16)) were nixed 
at 200 ng each in 10 ^1 final volume and phospnorylated 
5 with T4 polynucleotide Kinase (Pharmacia, Piscataway, NJ) 
with 1 mM ATP at 37-C for 1 hour. The reaction was stopped 
at 65-C for 5 minutes. Teminal oligonucleotides were 
added to the mixture and annealed into double-stranded forn 
by heating to 65 'C for 5 minutes, followed by cooling to 
10 room temperature over a period of 3 0 minutes. The annealed 
oligonucleotides were ligated together with 1.0 U of T4 DNA 
ligase (BRL) . The annealed and ligated oligonucleotides 
yield a double-stranded DNA flanked by a Bam HI site at its 
5- end and by a Hind III site at its 3- end. A 
15 translational stop codon (amber) immediately follows the 
Bam HI site. The gene VIII sequence begins with the codon 
GAA (Glu) two codons 3' to the stop codon. The double- 
stranded insert was phosphorylated using T4 DNA Kinase 
(Pharmacia, Piscataway, NJ) and ATP (10 mM Tris-HCl, pH 
20 7.5, 10 mM HgCl^) and cloned in frame with the Eco RI and 
sac I sites within the M13 polylinker. To do so, M13mpl8 
was digested with Bam HI (New England Biolabs, Beverley, 
MA) and Hind III (New England Biolabs) and combined at a 
molar ratio of 1:10 with the double-stranded insert. The 
25 ligations were performed at 16-C overnight in IX ligase 
buffer (50 mM Tris-HCl, pH 7.8, 10 mM MgCl^, 20 mM DTT, 1 mM 
ATP, 50 Mg/=1 BSA) containing 1.0 U of T4 DNA ligase (New 
England Biolabs). The ligation mixture was transformed 
into a host and screened for positive clones using standard 
3 0 procedures in the art. 

several mutations were generated within the right-half 
vector to yield functional M13IX42. The mutations were 
generated using the method of Kunkel et al., Meth. Enzymol. 
154:367-382 (1987), which is incorporated herein by 
35 reference, for site-directed mutagenesis. The reagents, 
strains and protocols were obtained from a Bio Rad 



wo 92/06176 



pcr/us9i/(ni4i 



40 

Mutagenesis kit (Bio Rad, Richmond. CA) and mutagenesis 
performed as reconnended by the Manufacturer . 

A Fok I site used for joining rhe right and lefr 
halves was generated 8 nucleotides 5' to the unique Eco RI 
5 site using the oligonucleotide 5 • -CTCGAATTCGTACATCCT 
GGTCATAGC-3* (SEQ ID NO: 17). The second Fok I site 
retained in the vector is naturally encoded at position 
3547; however, the sequence within the overhang was changed 
to encode CTTC. Two Fok I sites were removed from the 
10 vector at positions 239 and 7244 of K13mpl8 as veil as the 
Hind III site at the end of the pseudo gene VIII sequence 
using the mutajnt oligonucleotides 5 ' -CATTTTTGCAGATGGCTTAGA 
-3' (SEQ ID NO: 18) and 5 • -TAGCATTAACGTCCAATA-3 * (SEQ IH 
NO: 19), respectively. New Hind III and Mlu I sites were 
15 also introduced at position 3919 and 3951 of H13IX42. The 
oligonucleotides used for this mutagenesis had the 
sequences 5 » -ATATATTTTAGTAAGCTTCATCTTCT-3 ' (SEQ ID NO: 20] 
and 5'-GACAAAGAACGCGTGAAAACTTT-3 ' (SEQ ID NO: 21), 
respectively. The amino terminal portion of Lac Z 
2 0 deleted by oligonucleotide-directed mutagenesis using the 
mutant oligonucleotide 5'- 
GCGGGCCTCTTCGCTATTGCTTAAGAAGCCTTGCT-3' (SEQ ID NO: 22) . 
This deletion also removed a third M13mpl8 derived Fo>: I 
site. The distance between the Eco RI and Sac I sites was 
2 5 increased to ensure complete double digestion by inserting 
a spacer sequence. The spacer sequence was inserted using 
the oligonucleotide 5'- 

TTCAGCCTAGGATCCGCCGAGCTCTCCTACCTGCGAATTCGTACATCC-3 ' (SEQ ID 
NO: 23). Finally, an amber stop codon was placed at 
30 position 4492 using the mutant oligonucleotide 5»- 
TGGATTATACTTCTA AATAATGGA-3 ' (SEQ ID NO : 24). The amber 
Stop codon is used as a biological selection to ensure the 
proper recombination of vector sequences to bring together 
right and left halves of the randomized oligonucleotides. 
3 5 In constructing the above mutations, all changes made in a 
M13 coding region were performed such that the amino acid 
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sequence renained unaltered. It should be noted that 
several mutations within M13mpl8 were found which differed 
from the published sequence. Where kno--m, these seque.nce 
uilferences are recorded herein as found and therefore nay 
5 not correspond exactly to the published sequence of 
M13inpl8. 

The sequence of the resultant vector, M13IX42, is 
shown in Figure 5 (SEQ ID NO: 1) . Figure 3A also shows 
M13IX42 where each of the elements necessary for producing 

10 a surface expression library between right and left half 
randomized oligonucleotides is marked. The sequence 
between the two Fok I sites shown by the arrow is the 
portion of M13IX42 which is to be combined with a portion 
of the left-half vector to produce randon oligonucleotides 

15 as fusion proteins of gene VIII. 

M13IX22, or the left-half vector, was constructed to 
harbor the left half populations of randomized 
oligonucleotides. This vector was constructed from M13mpl9 
(Pharmacia, Piscataway, NJ) and contains: (1) Two Fok I 

20 sites for mixing with M13IX42 to bring together the left 
and right halves of the randomized oligonucleotides; (2) 
sequences necessary for expression such as a promoter and 
signal sequence and translation initiation signals; (3) an 
ECO Rl-Sac I cloning site for the randomized 

2 5 oligonucleotides; and (4) an amber stop codon for 
biological selection in bringing together right and left 
half oligonucleotides. 

Of the two Fok I sites used for mixing M13IX22 with 
M13IX42, one is naturally encoded in M13mpl8 and M13mpl9 
30 (at position 3547). As with M13IX42 , the overhang within 
this naturally occurring Fok I site was changed to CTTC. 
The other Fok I site was introduced after construction of 
the translation initiation signals by site-directed 
mutagenesis using the oligonucleotide 5'- 
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TAACACTCATTCCGGATGGAA.TCTGGAGTCTGGGT-3- (SEQIDNO: 25). 

The translation initiation signals were constructed by 
• annealing of overlapping oligonucleotides ^^J^^^^^l^^ 
above to produce a double-stranded insert containing a 5 
above t p overlapping 
5 ECO RI site and a 3 nmv. xxx 

oligonucleotides are .howB in Table VIII (SEQ 
..rL,n 34, ana ue.e ligate. as a -^^-^-^^^/^^^f;; 
between the E=o RI and Hind III s.tes of H13n,pl8 
.esoribea ^or the pseudo gene VIII insert. 
,0 binding site (AC.A.AC, is located in ol.gonucleot.d 01 
,SEQ ID no: 26, and the translation initiation <^;^' 
is the first three nucleotides of oligonucleotide 016 (SEO 

ID NO: 27) . 

TABLE VIII 



15 



0lia2nucleotide_Seid^^ 
mi gnnucleotlde SsnuSilce_L5:_to_3j_L 



20 



25 



30 



015 
016 

017 

018 

019 
020 

021 

022 

023 



AATT C GCC AAG GAG ACA GTC AT 
AATG AAA TAG CTA TTG CCT ACG GCA 
GCC GCT GGA TTG TT 
ATTA CTC GCT GCC CAA CCA GCC ATG 

GCC GAG CTC GTG AT 

GACC GAG ACT CCA GATATC CAA CAG 

GAA TGA GTG TTA AT 

TCI AG A ACG CGT C 

ACGT G ACG CGT TCT AGA AT TAA 
CACTCA TTC CTG T 

TG GAT ATC TGG AGT CTG GGT CAT 

CAC GAG CTC GGC CAT G 

GC TGG TTG GGC AGC GAG TAA TAA 

CAA TCC AGC GGC TGC C 

GT AGG CAA TAG GTA TTT CAT TAT 

GAC TGT CCT TGG CG 
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Oligonucleotide 017 (SEQ ID NO: 27) contained a Sac I 
restriction site 67 nucleotides downstream fron the ATG 
codon. The naturally occurring Eco RI site was removed and 
a new site introduced 25 nucleotides downstrean from the 
5 sad. Oligonucleotides 5 • -TGACTGTCTCCTTGGCGTGTGAAATTGTTA- 
3. (SEQ ID NO: 35) and 5 ' -TAACACTCATTCCGGATGGAATTCTGGAGTCT 
GGGT-3' (SEQ ID NO: 36) were used to generate each of the 
mutations, respectively. An amber stop codon was also 
introduced at position 3263 of M13mpl8 using the 
10 oligonucleotide 5 ■ -CAATTTTATCCTAAATCTTACCAAC-3 • (SEQ ID NO: 

37) . 

In addition to the above Eutations, a variety of other 
modifications were made to reaove certain sequences and 
redundant restriction sites. The lAC 2 ribosome binding 
15 site was removed when the original Eco RI site in M13npl8 
was mutated. Also, the Fok I sites at positions 239, 6361 
and 7244 of M13inpl8 were likewise removed with mutant 
oligonucleotides 5 • -CATTTTTGCAGATGGCTTAGA-3 • (SEQ ID KG: 

38) , 5 ' -CGAAAGGGGGGTGTGCTGCAA-3 • (SEQ ID NO: 39) and 5'- 
20 TAGCATTAACGTCCAATA-3 • (SEQ ID NO: 40), respectively. 

Again, mutations within the coding region did not alter the 
amino acid sequence. 

The resultant vector, M13IX22, is 7320 base pairs in 
length, the sequence of which is shown in Figure 6 (SEQ ID 
25 no: 2). The Sac I and Eco RI cloning sites are at 
positions 6290 and 6314, respectively. Figure 3A also 
shows M13IX22 where each of the elements necessary for 
producing a surface expression library between right and 
left half randomized oligonucleotides is marked. 

3 0 Tjibrarv construction 

Each population of right and left half randomized 
oligonucleotides from columns IR through 4 OR and columns IL 
through 40L are cloned separately into vi3IX42 and M13IX22, 
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respectively, to create sublibraries of right and left half 
randomized oligonucleotides. Therefore, a total of eighty 
sublibraries are generated. Separately naintaining each 
population of randomized oligonucleotides until the final 
5 screening step is performed to ensure naximmn efficiency of 
annealing of right and left half oligonucleotides. The 
greater efficiency increases the total number of randomized 
oligonucleotides which can be obtained. Alternatively, one 
can combine all forty populations of right half 
10 oligonucleotides (columns 1R-40R) into one population and 
of left half oligonucleotides (columns 1L-40L) into a 
second population to generate just one sublibrary for each. 

For the generation of sublibraries, each of the above 
populations of randomized oligonucleotides are cloned 

15 separately into the appropriate vector. The right half 
oligonucleotides are cloned into M13IX42 to generate 
sublibraries M13IX42.1R through M13IX42.40R. The left half 
oligonucleotides are siu^.larly cloned into M13IX22 to 
generate sublibraries K13IX22.1L through M13IX22.40L. Each 

20 vector contains unique Eco RI and Sac I restriction enzyme 
sites which produce 5' and 3' single-stranded overhangs, 
respectively, when digested. The single strand overhangs 
are used for the annealing and ligation of the 
complementary single-stranded random oligonucleotides. 

25 The randomized oligonucleotide populations are cloned 

between the Eco RI and Sac I sites by sequential digestion 
and ligation steps. Each vector is treated with an excess 
of ECO RI (New England Biolabs) at 37-C for 2 hours 
followed by addition of 4-24 units of calf intestinal 
30 alkaline phosphatase (Boehringer Mannheim, Indianapolis, 
IN) Reactions are stopped by phenol/ chloroform extraction 
and ethanol precipitation. The pellets are resuspended m 
an appropriate amount of distilled or deionized water 
(dH,0) . About 10 pmol of vector is mixed with a 5000-fold 
3 5 .olar excess of each population of randomized 
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Tris- 



oligonucleotides in 10 ^1 of IX ligase buffer (50 mM 
HCl, pH 7.8, 10 MgClj, 2 0 ruM DTT, 1 r.M ATP, 50 /ig/ml BSA) 
containing 1.0 U of T4 DNA ligase (BRL, Gaithersburg, KD) . 
The ligation is incubrred at 16 'C for le hours. Keacrions 
5 are stopped by heating at 75 'C for 15 ninutes and the DNA 
is digested with an excess of Sac I (New England Biolabs) 
for 2 hours. Sac I is inactivated by heating at 75'C for 
15 minutes and the volume of the reaction nixture is 
adjusted to 300 ^1 with an appropriate amount of lOX ligase 
10 buffer and dH^O. One unit of T4 DNA ligase (BRL) is added 
and the mixture is incubated overnight at 16 'C. The DNA is 
ethanol precipitated and resuspended in TE (10 n-M Tris-HCl, 
pH 8.0, 1 cM EDTA) . DNA from each ligation is 
electroporated into XLl Blue^*^ cells (Stratagene, La Jolla, 
15 CA) , as described below, to generate the sublibraries . 

E. coli XLl Blue^" is electroporated as described by 
Smith et al . , Focus 12:38-40 (1990) which is incorporated 
herein by reference. The cells are prepared by inoculating 
a fresh colony of XLls into 5 nls of SOB without magnesium 

20 (20 g bacto-tryptone, 5 g bacto-yeast extract, 0.584 g 
NaCl, 0.186 g KCl, dH^O to 1,000 mis) and grown with 
vigorous aeration overnight at 37 'C. SOB without magnesium 
(500 ml) is inoculated at 1:1000 with the overnight culture 
and grown with vigorous aeration at 37 'C until the OD550 is 

25 0.8 (about 2 to 3 h) . The cells are harvested by 
centrifugation at 5,000 rpn (2,600 x g) in a GS3 rotor 
(Sorvall, Newtown, CT) at 4*0 for 10 ninutes, resuspended 
in 500 ml of ice-cold 10% (v/v) sterile glycerol and 
centrifuged and resuspended a second time in the same 

30 manner. After a third centrifugation, the cells are 
resuspended in 10% sterile glycerol at a final volume of 
about 2 ml, such that the 00550 of the suspension is 200 to 
300. Usually, resuspension is achieved in the 10% glycerol 
that remains in the bottle after pouring off the supernate. 

3 5 Cells are frozen in 4 0 jil aliquots in microcentrifuge tubes 
using a dry ice-ethanol bath and stored frozen at -70*C. 
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Frozen cells are electroporai.ed by thawing slowly on 
ice before use and fixing with about 10 pg to 500 ng of 
vector per 40 .1 of cell suspension. A 40 ,1 aliquot is 
placed in an 0.1 c.n electroporation chamber (Bxo-Raa 
5 Richmond, CA) and pulsed once at O'C using 200 P. paralle 
resistor, 25 ,F, 1.83 kV, which gives a pulse length (r) of 
-4 ms. A 10 Ml aUquot of the pulsed cells are diluted 
into 1 ml SOC (98 mis SOB plus 1 .1 of 2 M MgCl^ and 1 ml o. 
2 M glucose) in a 12- x 75-mm culture tube, and the culture 
10 is Shaken at 37'C for 1 hour prior to culturing m 
selective media, (see below) . 

Each of the eighty sublibraries are cultured using 
methods known to one skilled in the art. Such methods can 
be found in Sanbrook et al., Molecular Cloning: A 
15 Laboratory Manuel, Cold Spring Harbor Laboratory, Cold 
spring Harbor, 1989, and in Ausubel et al., Curren. 
protocols in Molecular Biology, John Wiley and Sons, New 
York 1989, both of which are incorporated herein by 
reference. Briefly, the above 1 ml sublibrary cultures 
20 were grown up by diluting 50-fold into 2XYT media (16 g 
tryptone, 10 g yeast extract, 5 g NaCl) and culturing at 
37 -C for 5-8 hours. The bacteria were pelleted by 
centrifugation at 10,000 xg. The supernatant containing 
phage was transferred to a sterile tube and stored at 4-C. 

Double strand vector DNA containing right and left 
half randomized oligonucleotide inserts is isolated from 
the cell pellet of each sublibrary. Briefly, the pellet is 
washed in TE (10 mM Tris, pH 8.0, 1 mM EDTA) and 
recollected by centrifugation at 7,000 rpm for 5' in a 
Sorval centrifuge (Newtown, CT) . Pellets are resuspended 
in 6 mis of 10% sucrose, 50 m Tris, pH 8.0- 3.0 ml of 10 
xng/Ml lysozyne is added and incubated on ice for 20 
: nutes 12 mis of 0.2 M NaOH, 1. SOS is added followe by 
10 minutes on ice. The suspensions are then ^^^^^^^^^^^^ 
35 ice for 20 minutes after addition of 7.5 mis of 3 M NaOAc, 
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pH 4.6. The samples are centrifuged at 15,000 rpm for 15 
ninutes at A'C, RNased and extracted with 
phenol/ chlorofora, followed by ethanol precipitation. The 
pellets are resus.^ended, weighed and an equal weight of 
5 CSCI2 is dissolved into each tube until a density of 1.60 
g/ml is achieved. EtBr is added to 600 ^iq/nl and the 
double-stranded DNA is isolated by eq^ailibriur 
centrifugation in a TV-1665 rotor (Sor^'al) at 50,000 rpn 
for 6 hours. These DNAs from each right and left half 
10 sublibrary are used to generate forty libraries in which 
the right and left halves of the randomized 
oligonucleotides have been randomly joined together. 

Each of the forty libraries are produced by joining 
together one right half and one left half sublibrary. The 

15 two sublibraries joined together corresponded to the sa=e 
column number for right and left half randoa 
oligonucleotide synthesis. For example, sublibrary 

M13IX42.1R is joined with "131X22. IL to produce the surface 
expression library M13IX.1RL. In the alternative situation 

2 0 where only two sublibraries are generated from the combined 
populations of all right half synthesis and all left half 
synthesis, only one surface expression library would be 
produced. 

For the random joining of each right and left half 
2 5 oligonucleotide populations into a single surface 
expression vector species, the DNAs isolated from each 
sublibrary are digested an excess of Fok I (New England 
Biolabs) . The reactions are stopped by phenol/chloroform 
extraction, followed by ethanol precipitation. Pellets are 
30 resuspended in dHjO. Each surface expression library is 
generated by ligating equal molar amounts (5-10 pmol) of 
Fok I digested DNA isolated from corresponding right and 
left half sublibraries in 10 ^\ of IX ligase buffer 
containing 1.0 U of T4 DNA ligase (Bethesda Research 
35 Laboratories, Gaithersburg , MD) . The ligations proceed 
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overnight at 16 'C and are electroporated into the sup 0 
strain MK30-3 (Boehringer Mannhein Biochenical, (BMB) , 
Indianapolis, IN) as previously described for XLl cells. 
Because MK30-3 is sup O, only the vector porrions encoaing 
5 the randomized oligonucleotides which come together will 
produce viable phage. 

.^j.^^^ir.r ^ of Sur f ^r-P Kxpression l^ibr arigs 

purified phage are prepared froir, 50 nl liquid cultures 
of XLl Blue" cells (Stratagene) which are infected at a 
10 m.o.i. of 10 from the phage stocks stored at 4-C. The 
cultures are induced with 2 mM IPTG. Supernatants from all 
cultures are combined and cleared by two centrifugations, 
and the phage are precipitated by adding 1/7.5 volumes of 
PEG solution (25% PEG-8000, 2.5 M NaCl) , followed by 
15 incubation at 4-C overnight. The precipitate is recovered 
by centrifugation for 90 ininutes at 10,000 x g. Phage 
pellets are resuspended xn -^5 ml of 0.01 M Tris-HCl, pH 
7 6 1.0 mM EDTA, and 0.1% Sarkosyl and then shaken slowly 
at room temperature for 3 0 minutes. The solutions are 
20 adjusted to 0.5 M NaCl and to a final concentration of 5% 
polyethylene glycol. After 2 hours at 4-C, the 

precipitates containing the phage are recovered by 
centrifugation for 1 hour at 15,000 X g. The precipitates 
are resuspended in 10 ml of NET buffer (O.l M NaCl, 1.0 mM 
25 EDTA, and 0.01 M Tris-HCl, pH 7.6), mixed well, and the 
phage repelleted by centrifugation at 170,000 X g for 3 
hours. The phage pellets are subsequently resuspended 
overnight in 2 ml of NET buffer and subjected to cesium 
chloride centrifugation for 18 hours at 110,000 X g (3-86 
30 g of cesium chloride in 10 nl of buffer) . Phage bands are 
collected, diluted 7-fold with NET buffer, recentrifuged at 
170 000 X g for 3 hours, resuspended, and stored at A'C m 
0 3' ml of NET buffer containing 0.1 mM sodiuB azide. 



Ligand binding proteins used for panning 
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streptavidin coated dishes are first biotinylated and then 
absorbed against U\^-inactivated blocking phage (see below) . 
The biotinylating reagents are dissolved in 
dimethyl fonaaiTiide at d ratiu of 2,4 ray solid NHS-SS-Biotin 
5 (sulfosuccinicidyl 2-(biotinanido)ethyl-l,3'- 
dithiopropionate; Pierce, Rockford, IL) to 1 ml solvent and 
used as recommended by the manufacturer. Small-scale 
reactions are accomplished by mixing 1 ^1 dissolved reagent 
with 43 fil of 1 mg/ml ligand binding protein diluted in 
10 sterile bicarbonate buffer (0.1 M NaHCO^, pH 8.6). After 2 
hours at 25 'C, residual biotinylating reagent is reacted 
with 500 Ml 1 M ethanolamine (pH adjusted to 9 with HCl) 
for an additional 2 hours. The entire sample is diluted 
with 1 ml TBS containing 1 mg/ml BSA, concentrated to about 
15 50 111 on a Centricon 30 ultra-filter (Amicon) , and washed 
on the same filter three times with 2 ml TBS and once with 
1 ml TBS containing 0.02% NaNj and 7 x 10^^ UV-inactivated 
blocking phage (see below) ; the final retentate (60-80 ^1) 
is stored at 4'C. Ligand binding proteins biotinylated 
20 with the NHS-SS-Biotin reagent are linked to biotin via a 
disul fide-containing chain . 

UV-irradiated M13 phage were used for blocking binding 
proteins which fortuitously bound filamentous phage in 
general. M13mp8 (Messing and Vieira, Gene 19; 262-276 

25 (1982), which is incorporated herein by reference) was 
chosen because it carries two amber stop codons, which 
ensure that the few phage surviving irradiation will not 
grow in the sup 0 strains used to titer the surface 
expression libraries. A 5 ml sample containing 5 x 10 

30 M13mp8 phage, purified as described above, was placed in a 
small petri plate and irradiated with a germicidal lamp at 
a distance of two feet for 7 minutes (flux 150 ^W/cm ) . 
NaNj was added to 0.02% and phage particles concentrated to 
10^^ particles/ml on a Centricon 30-kDa ultrafilter 

35 (Amicon) , 
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For panning, polystyrene petri plates (60 
Falcon; Becton Dickinson, Lincoln Par):, NJ) are incubated 
with 1 ml of 1 ng/nl of streptavidin (BMB) in 0 . 1 M NaHC03 
pll 8.0-0.02% NaK3 ir. a sinall, air-tight plastic box 
overnight in a cold room. The next day streptavxdxn is 
removed and replaced with at least 10 blocking solution 
(29 zng/Bl of BSA; 3 ,g/ul of streptavidin; 0.1 M liaECO, ph 
8.6-0.02% NaN3) and incubated at least 1 hour at roon 
temperature. The blocking solution is removed and plates 
are washed rapidly three times with Tris buffered saline 
containing 0.5% TVeen 20 (TBS-Q.5% Tween 20). 

selection of phage expressing peptides bound by the 
ligand binding proteins is performed with 5 ^1 (2.7 
ligand binding prorein) of blocked biotinylated Ugand 
15 binding proteins reacted with a 50 ^1 portion of each 
library. Each mixture is incubated overnight at 4-C, 
diluted with 1 ml TBS-0.5% T-^een 20, and transferred to a 
streptavidin-coated petri plate prepared as described 
above. After rocking 10 minutes at room temperature, 
unbound phage are removed and plates washed ten times with 
TBS-0.5% Tween 20 over a period of 30-90 minutes. Bound 
phage are eluted from plates with 800 Ml sterile elution 
buffer (1 mg/ml BSA, 0.1 M HCl, pH adjusted to 2 . 2 with 
glycerol) for 15 minutes and eluates neutralized with 48 ^1 
2 M Tris (pH unadjusted) . A 20 ^1 portion of each eluate 
is titered on MK30-3 concentrated cells with dilutions of 
input phage. 

A second round of panning is performed by treating 750 
ul of first eluate from each library with 5 mM DTT for 10 
minutes to break disulfide bonds linking biotin groups to 
residual biotinylated binding proteins. The treated eluate 
is concentrated on a Centricon 30 ultrafilter (A^icon) , 
washed three times with TBS-0.5% Tween 20, and concentrated 
to a final volume of about 50 ^1- Final retentate is 
transferred to a tube containing 5.0 ul (2.7 ligand 
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binding protein) blocked biotinylated ligand binding 
proteins and incubated overnight. The solution is diluted 
with 1 ml TBS-0.5% Tveen 20, panned, and eluted as 
described above on fresh streptavldin-codteu petri plates. 
5 The entire second eluate (800 ^1) is neutralized with 48 /il 

2 M Tris, and 20 /il is titered simultaneously with the 
first eluate and dilutions of the input phage. 

Individual phage populations are purified through 2 to 

3 rounds of plaque purification. Briefly, the second 
10 eluate titer plates are lifted with nitrocellulose filters 

(Schleicher & Schuell, Inc., Keene, NH) and processed by 
washing for 15 minutes in TBS (10 mM Tris-HCl, pH 7.2, 150 
mM NaCl), followed by an incubation with shaking for an 
additional 1 hour at 37 with TBS containing 5% nonfat dr>' 

15 milk (TBS-5% NDM) at 0.5 ml/CIn^ The wash is discarded and 
fresh TBS-5% NDM is added (0.1 ml/cin^) containing the ligand 
binding protein between 1 nM to 100 mM, preferably between 
1 to 100 MM. All incub^-^ions are carried out in heat- 
sealable pouches (Sears) . Incubation with the ligand 

20 binding protein proceeds for 12-16 hours at 4'C with 
shaking. The filters are removed from the bags and washed 
3 times for 30 minutes at room temperature with 150 nls of 
TBS containing 0.1% NDM and 0.2% NP-40 (Sigma, St. Louis, 
MO) . The filters are then incubated for 2 hours at room 

25 temperature in antiserum against the ligand binding protein 
at an appropriate dilution in TBS-0-5% NDM, washed in 3 
changes of TBS containing 0.1% NDM and 0.2% NP-40 as 
described above and incubated in TBS containing 0.1% NDM 
and 0.2% NP-40 with 1 x 10^ cpm of ^^I-labeled Protein A 

30 (specific activity = 2,1 x 10^ cpm/Mg) • After a washing 
with TBS containing 0.1% NDM and 0.2% NP-40 as described 
above, the filters are wrapped in Saran Wrap and exposed to 
Kodak X-Omat x-ray film (Kodak, Rochester, NY) for 1-12 
hours at -70 using Dupont Cronex Lightning Plus 

3 5 Intensifying Screens (Dupont, Willmington, DE) . 
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Positive plaques identified are cored with the large 
end of a pasteur pipet and placed into l ml of SM (5.8 g 
NaCl, 2 g MgSO,.7H30, 50 ml 1 M Tris-HCl, pH 7.5, 5 mis 2% 
gelatin, to 1000 ^1. wxth dH^O) plus 1-3 drops of CHCI3 ana 
incuhated at 37-C 2-3 hours or overnight at 4-C. The phage 
are diluted 1:500 in SM and 2 /.I are added to 300 m1 of XLl 
cells plus 3 Els of soft agar per 100 niu' plate. The XLl 
cells are prepared for plating by growing a colony 
overnight in 10 ml LB (10 g bacto-tryptone, 5 g bacto-yeast 
extract, 10 g NaCl, 1000 ml dHjC) containing 100 m1 of 20% 
maltose and 100 m1 of 1 M MgSO,. The bacteria are pelletted 
by centrifugation at 2000 xg for 10 minutes and the pellet 
is resuspended gently in 10 mis of 10 mM MgSO,. The 
suspension is diluted 4-fold by adding 30 mis of 10 mM MgSO, 
to give an OD^^ of approximately 0.5. The second and third 
round screens are identical to that described above except 
that the plaques are cored with the small end of a pasteur 
pipet and placed inro 0 . 5 mis SM plus a drop of CHCI3 and 1- 
5 Ml of the phage following incubation are used for plating 
without dilution. At the end of the third round of 
purification, an individual plaque is picked and the 
templates prepared for sequencing. 

Template Preparation an d Sequencing 

Templates are prepared for sequencing by inoculating 
a 1 ml culture of 2XYT containing a 1:100 dilution of an 
overnight culture of XLl with an individual plaque. The 
plaques are picked using a sterile toothpick. The culture 
is incubated at 37-C for 5-6 hours with shaking and then 
transferred to a 1.5 ml microfuge tube. 200 m of PEG 
solution is added, followed by vortexing and placed on ice 
for 10 minute.. The phage precipitate is recovered by 
centrifugation in a microfuge at 12,000 x g for 5 minutes. 
The supernatant is discarded and the pellet is resuspended 
in 230 Ml of TE (10 mM Tris-HCl, pH 7 . 5 , 1 mM EDTA) by 
gently pipeting with a yellow pipet tip. Fhenol (200 mD 
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is added, followed by a brief vortex and microfuged to 
separate the phases. The aqueous phase is transferred to 
a separate tube and extracted with 2 00 Ml of 
phenol/chloroforr. (1.1) as described above for the phenol 
extraction. A 0.1 volume of 3 M NaOAc is added, followed 
by addition of 2 . 5 volumes of ethanol and precipated at 
-20 -C for 20 ninutes. The precipated teir.plates are 
recovered by centrif ugation in a microfuge at 12,000 x g 
for 8 minutes. The pellet is washed in 70% ethanol, dried 
and resuspended in 25 ^1 TE. Sequencing was perfomed 
using a Sequenase'' sequencing kit following the protocol 
supplied by the manufacturer (U.S. Biochenical , Cleveland, 
OH) . 



KXAMPLE II 

15 r.n^^tAnn anc^ rh^^racteriz P ^ ^ nf PeotidP Ligands Ggngfated 
From Oligonucleotides Havin g R^THdom rndnns at Two 
Predetermip p'^ Positions 

This example shows the generation of a surface 
expression library from a population of oligonucleotides 

2 0 having randomized codons. The oligonucleotides are ten 
codons in length and are cloned into a single vector 
species for the generation of a M13 gene Vlll-based surface 
expression library. The example also shows the selection 
of peptides for a ligand binding protein and 

2 5 characterization of their encoded nucleic acid sequences. 

m j qnpucleot ^f^*^ Synthesis 

Oligonucleotides were synthesized as described in 
Example I. The synthesizer was programmed to synthesize 
the sequences shown in Table IX. These sequences 
30 correspond to the first random codon position synthesized 
and 3- flanking sequences of the oligonucleotide wnich 
hybridizes to the leader sequence in the vector. The 
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r^>-r^ -icipri for insertional 
complementary sequences are asea tor 

mutagenesis of the synthesized population of 
oligonucleotides . 

Table IX 



10 



15 



Column 




.qp.rmencG fn' to 3*) 


column 


1 


AA ( A/C) GGTTGGTCGGTACCGG 


colmnn 


2 


AG ( A/G) GGTTGGTCGGTACCGG 


column 


3 


AT (A/G) GGTTGGTCGGTACCGG 


column 


4 


AC (A/G) GGTTGGTCGGTACCGG 


column 


5 


CA ( G/T ) GGTTGGTCGGTACCGG 


column 


6 


CT ( G/ C ) GGTTGGTCGGTACCGG 


column 


7 


AG ( T / C ) GGTTGGTCGGTACCGG 


column 


8 


AT (T/C) GGTTGGTCGGTACCGG 


column 


9 


CC ( A/ C ) GGTTGGTCGGTACCGG 


column 


10 


T ( A/T) TGGTTGGTCGGTACCGG 



20 



The next eight random codon positions were synthesized 
as described for Table V in Example I. Following the ninth 
position synthesis, the reaction products were once more 
combined, mixed and redistributed into 10 new reaction 
columns. Synthesis of the last random codon position and 
5' flanking sequences are shown in Table X. 

Table X 



Column 


Reqiipnce rS' to 3!J_ 


column 1 


AGGATCCGCCGAGCTCAA(A/C) h 


coliimn 2 


AGGATCCGCCGAGCTCAG {h/G)h 


column 3 


AGGATCCGCCGAGCTCAT (A/G) A 


column 4 


AGGATCCGCCGAGCTCAG (A/G) A 


column 5 


AGGATCCGCCGAGCTCCA (G/T) A 


\^\J A- ' ' ■" ■ 1 ^ 


AGGATCCGCCGAGCTCCT (G/C) A 


column 7 


AGGATCCGCCGAGCTCAG (T/C) h 


column 8 


AGGATCCGCCGAGCTCAT (T/C) A 


column 9 


AGGATCCGCCGAGCTCCC (A/C) h 


column 10 


AGGATCCGCCGAGCTCT ( A/T) TA 



wo 92/06176 



PCr/LS91/0714l 



55 



The reaction products were mixed once more and the 
oligonucleotides cleaved and purified as reconunended by the 
manufacturer. The purified population of oligonucleotides 
were used to generate a surface expression library as 
5 described below. 



vpctor Construction 

The vector used for generating surface expression 
libraries from a single oligonucleotide population (i.e., 
without joining together of right and left half 
10 oligonucleotides) is described below. The vector is a M13- 
bdsed expression vector which directs the synthesis of gene 
Vlll-peptide fusion proteins (Figure 4). This vector 
exhibits all the functions that the combined right and left 
half vectors of Example I exhibit. 



15 



20 



25 



30 



An M13-based vector was constructed for the cloning 
and surface expression of populations of randon 
oligonucleotides (Figure 4, M13IX30) , M13mpl9 (Pharmacia) 
was the starting vector. This vector was modified to 
contain, in addition to the encoded wild type M13 gene 
VIII: (1) a pseudo-wild type gene, gene VIII sequence with 
an amber stop codon placed between it and the restriction 
sites for cloning oligonucleotides; (2) Stu I , Spe I and 
Xho I restriction sites in frame with the pseudo-wild type 
gVIII for cloning oligonucleotides; (3) sequences necessary 
for expression, such as a promoter, signal sequence and 
translation initiation signals; (4) various other mutations 
to remove redundant restriction sites and the amino 
terminal portion of Lac Z. 

Construction of M13IX30 was performed in four steps, 
in the first step, a precursor vector containing the pseudo 
gene VIII and various other mutations was constructed, 
M13IX01F. The second step involved the construction of a 
small cloning site in a separate M13mpl8 vector to yield 
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^v,Sr-H c-ten expression sequences ana 
M13IX03. in the third step, exp 

oloni., .i.es were constructed in M133X03 to generate t. 
intermediate vector M13IX04B. The fourth step involved the 
incorporation =. tha newly constructec seguences fro. the 
5 intermediate vector into H13IX0ir to yxeld M13IX30^ 
incorporation of these se^ences lin.ed the. wrth the 
pseudo gene VIII. 

r.i^ ^h^ n-ec-rsor vector M13IX01F vas 
Construction of the p^ec^r^ux 

similar to that of M13IX42 described in Example I except 

10 for the following features: (1) M13mpl9 was used as the 

starting vector; (2) the FoK I site 5- to the un.gue Eco 

RI Site was not incorporated and the overhang at 

naturally occurring Fo>: I site at position 3547 was not 

Changed to 5.-crrc-3.; (3) the spacer sequence was not 

15 incorporated between the Eco RI and Sac I sites; and 

the amber codon at position 4492 was not incorporated. 

in the second step, K" 3mpl8 wa. nutated to remove the 
5. end of Lac 2 up to the Lac i binding site and including 
the Lac Z ribosome binding site and start codon. 
20 Additionally, the polylinker was removed and a Mlu I site 
was introduced in the coding region of Lac Z. A sing e 
oligonucleotide was used for these mutagenesis and had the 
sequence "5 • -AAACGACGGCCAGTGCCAAGTGACGCGTGTGAAATTGTTATCC- 
3." (SEQ ID no: 41) . Restriction enzyme sites for Hind III 
25 and ECO RI were introduced downstream of the Mlul site 
using the oligonucleotide "5 
GGCGAAAGGGAAnCTGCAAGGCGATrAAGCrTGGGTAACGCC-3 ' " (SEQ ID NO: 
42) . These modifications of M13mpl8 yielded the vector 
M13IX03 . 

30 The expression seq^-iences and cloning sites were 

introduced into M13IX03 by chemically synthesizing a series 
of oligonucleotides which encode both strands of the 
desired sequence. The oligonucleotides are presented m 
Table XI (SEQ ID NOS : 43 through 50). 
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TABLE-J^ 
M13IX30 Oligonucleotide Series 



10 



Top Srrand 
Oligonucleotides 

084 

027 



028 



029 



Seguence f 5 ' to 3 M 

GGCGTTACCCAAGCTTTGTACATGGAGAAAATAAAG 

TGAAACAAAGCACTATTGCACTGGCACTCTTACCGT 
TACCGT 

TACTGTTTACCCCTGTGACAAAAGCCGCCCAGGTCC 
AGCTGC 

TCGAGTCAGGCCTATTGTGCCCAGGGATTGTACTAG 
TGGATCCG 



15 



Bottom 

oligonucleotides 
085 

031 

032 
033 



20 



Sequence f5' to 3') 

TGGCGAAAGGGAATTCGGATCCACTAGTACAATCCCTG 

GGCACAATAGGCCTGACTCGAGCAGCTGGACCAGGGCG 
GCTT 

TTGTCACAGGGGTAAACAGTAACGGTAACGGTAAGTGT 
GCCA 

GTGCAATAGTGCTTTGTTTCACTTTATTTTCTCCATGT 
ACAA 



The above oligonucleotides except for the terminal 
oligonucleotides 084 (SEQ ID NO: 43) and 085 (SEQ ID KG: 
47) of Table XI were mixed, phosphorylated, annealed and 
iigated to form a double stranded insert as described in 
25 Example I. However, instead of cloning directly into the 
intermediate vector the insert was first amplified by PGR 
using the terminal oligonucleotides 084 (SEQ ID NO: 43) and 
085 (SEQ ID NO: 47) as primers. The terminal 

oligonucleotide 084 (SEQ ID NO: 43) contains a Hind III 
site 10 nucleotides internal to its 5' end. 
Oligonucleotide 085 (SEQ ID NO: 47) has an Eco RI site at 
its 5' end. Following amplification, the products were 
restricted with Hind III and Eco RI and Iigated as 
described in Example I into the polylinker of M13inpl8 



30 



wo 92/06176 



PCT/US91/07U1 



58 



10 



15 



30 



digested with the same two enzymes. The resultant double 
stranded insert contained a ribosome binding site, a 
translation initiation codon followed by a leader sequence 
and three restriction enzyme tiites lor cloning random 
oligonucleotides (Xho I, Stu I, Spe I). The vector was 
named M13IX04. 

During cloning of the double-stranded insert, it was 
found that one of the GCC codons in oligonucleotides 028 
and its complement in 031 was deleted. Since this deletion 
did not affect function, the final construct is missing one 
of the two GCC codons. Additionally, oligonucleotide 032 
contained a GTG codon where a GAG codon was needed. 
Mutagenesis was performed using the oligonucleotide 5'- 
TAACGGTAAGAGTGCCAGTGC-3 • (SEQ ID NO: 51) to convert the 
codon to the desired sequence. The resultant intermediate 
vector was named M13IX04B. 



The fourth step in constructing K13IX30 involved 
inserting the expression and cloning sequences from 
M13IX04B upstream of the pseudo-wild type gVIII in 
20 M13IX01F. This was accomplished by digesting M13IX04B with 
Dra III and Ban HI and gel isolating the 700 base pair 
insert containing the sequences of interest. M13IX01F was 
likewise digested with Dra III and Bam HI. The insert was 
combined with the double digested vector at a molar ratio 
25 of 3:1 and ligated as described in Example I. It should be 
noted that all modifications in the vectors described 
herein were confirmed by sequence analysis. The sequence 
of the final construct, M13IX30, is shown in Figure 7 (SEQ 
ID NO: 3). Figure 4 also shows M13IX30 where each of the 
elements necessary for surface expression of randomized 
oligonucleotides is marked. 
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T^H^.T ^ construction. Screening an d cha >-arteri zat ion of 
Encoded Oligonucl eotides 

construction ol an I.I13IX3 0 surface expression library- 
is accomplished identically to that described in Example I 
5 for sublibrary construction except the oligonucleotides 
described above are inserted into M13IX30 by mutagenesis 
instead of by ligation. The library^ is constructed and 
propagated on MK3 0-3 (BMB) and phage stocks are prepared 
for infection of XLI cells and screening. The surface 
10 expression library is screened and encoding 
oligonucleotides characterized as described in Example I. 

EXAMPLE III 

T5.n1ation and Character isation of Ppptjde Liqands 
Generated from Right an ri T^ft Half 
15 Degenerate oligo nucleotides 

This example shows the construction and expression 
of a surface expression library of degenerate 
oligonucleotides. The encoded peptides of this example 
derive from the mixing and joining together of two 
20 separate oligonucleotide populations. Also demonstrated 
is the isolation and characterization of peptide ligands 
and their corresponding nucleotide sequence for specific 
binding proteins. 

■q ypt^esis of Oligonuc leotide Populatjpns 

25 A population of left half degenerate 

oligonucleotides and a population of right half 
degenerate oligonucleotides was synthesized using 
standard automated procedures as described in Example I. 

The degenerate codon sequences for each population 
3 0 of oligonucleotides were generated by sequentially 
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synthesizing the triplet IJNG/T where N is an equal 
mixture of all four nucleotides. The antisense sequence 
for each population of oligonucleotides was synthesized 
and each population contained 5' and 3' flanking 
5 sequences complementary to the vector sequence. The 
complementary termini was used to incorporate each 
population of oligonucleotides into their respective 
vectors by standard mutagenesis procedures. Such 
procedures have been described previously in Example I 
10 and in the Detailed Description. Synthesis of the 

antisense sequence of each population was necessary since 
the single-stranded form of the vectors are obtained only 
as the sense strand. 

The left half oligonucleotide population was 
15 synthesized having the following sequence: 5'- 

AGCTCCCGGATGCCTCAGAAGATG (A/CNN) 9GGCTTTTGCCACAGGGG-3 ' (SEQ 
ID NO: 52) . The right half oligonucleotide population 
was synthesized having the rollowing sequence: 5'- 
CAGCCTCGGATCCGCC (A/CNN) 10ATG (A/C) GAAT-3 ' (SEQ ID NO. 53). 
2 0 These two oligonucleotide populations when incorporated 
into their respective vectors and joined together encode 
a 20 codon oligonucleotide having 19 degenerate positions 
and an internal predetermined codon sequence. 



Vector Construction 

2 5 Modified forms of the previously described vectors 

were used for the construction of right and left half 
sublibraries. The construction of left half sublibraries 
was performed in an M13-based vector termed M13ED03. 
This vector is a modified form of the previously 

3 0 described M13IX3G vector and contains all the essential 

features of both M13IX30 and M13IX22. M13ED03 contains, 
in addition to a wild type and a pseudo-wild type gene 
VIII, sequences necessary for expression and two Fok I 
sites for joining with a right half oligonucleotide 
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sublibrari^. Therefore, this vector conbines the 
advantages of both previous vectors in rhat it can be 
used for the generation and expression of surface 
expression libraries f^om a single oligonucleotide 
5 population or it can be joined with a sublibrary to bring 
together right and left half oligonucleotide populations 
into a surface expression library. 

M13ED03 was constructed in two steps from M13IX30. 
The first step involved the modification of M13IX30 to 
10 remove a redundant sequence and to incorporate a sequence 
encoding the eight amino-terminal residues of human B- 
endorphin. The leader sequence was also mutated to 
increase secretion of the product. 

During construction of M13IX04 (an intermediate 
15 vector to M131X30 which is described in Example II) , a 

six nucleotide sequence was duplicated in oligonucleotide 
027 (SEQ ID NO: 44) and its complement 032 (SEQ ID NO: 
49). This sequence, 5 ' -TTACCG-3 ' , was deleted by 
mutagenesis in the construction of M13ED01. The 

2 0 oligonucleotide used for the mutagenesis was 5'- 

GGTAAACAGTAACGGTAAGAGTGCCAG-3 ' (SEQ ID NO: 54). The 
mutation in the leader sequence was generated using the 
oligonucleotide 5 ' -GGGCTTTTGCCACAGGGGT-3 • (SEQ ID NO: 
55) , This mutagenesis resulted in the A residue at 
25 position 6353 of M13IX30 being changed to a G residue. 
The resultant vector was designated M13IX32, 

To generate M13ED01, the nucleotide sequence 
encoding 6-endorphin (8 amino acid residues of 6- 
endorphin plus 3 extra amino acid residues) was 

3 0 incorporated after the leader sequence by mutagenesis. 

The oligonucleotide used had the following sequence: 5'- 

AGGGTCATCGCCTTCAGCTCCGGATCCCTCAGAAGTCATAAACCCCCCATAGGC 
TTTTGCCAC-3 ' (SEQ ID NO: 56). This mutagenesis also 
removed some of the downstream sequences through the Spe 
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I site. 



The 



The second step in the construction of H13ED03 
involved vector chan.es ,hich put the 3-endorphxn 
se^ence in fra.e with the downstrean P"-^=^°-^-^;^ ^ ^ 
3 se^ence and incorporated a Fo. I site for " 

HI, hrarv of right half oligonucleotides. This vector 
::":::,"e: to incorporate Oligonucleotide populations 

mutagenesis using se^ences complementary to those 
fLnlcing or overlapping with the encoded B-^"''"- 
10 s.<rience. The ai>sence of 6-endorphin expression 
mutagenesis can therefore be used to i=easure the 
mutagenesis freguency. In addition to the a.ove vec . 
changes, M13ED03 was also modified to contain an amber 
TZ position ...2 for biological .election during 
15 joining of right and left half sublibraries . 

The mutations were incorporated ,sing standard^ 
mutagenesis procedures as described in "-^^^ . 
frame shift changes and Fo. I site were generated using 

the oligonucleotide 
,0 TC0C™CTCC«G.T.CCT»=AACC.Ta^CCCCCC.T.==C-3. SEO 

TO NO- 57) The amber codcn was generated using the 
„U,o;ucliotide 5.-c™.TCCTAA.TCTT.CCAAC-3. <SBQ ID 
HO. 58, . The full sequence of the resultant vector, 
M13ED03, is provided in Figure 8 (SEQ ID HO: 1). 

The construction of right half ollgonucleotiae 
subUbraries -as performed in a ^LT:...^ 
M13IX42 vector. The new vector, M13IX421, 
to «r3IX42 except that the amber codon between the Eco 

Tl-sacl c oning site and the pseudo-gene VIII 

Rl-bacj. t.xw ^ expression off 

This chanae ensures that aii ex^i-^ 

" "rtrJc" pro ter produces a peptide-gene VIXI fusion 
: ofein Uval of the amber codon P-cr.ed by 
mutagenesis using the following f ^^^^ 

GCCTTCAGCCTCGGATCCGCC-3. (SEQ ID NO : 59) . The full 
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sequence cf M13IX421 is shovn in Figure 9 (SEQ ID NO: 5). 

T.ibrar y Construction, Screening and Characteriza tion of 
Encoded Oligonucleotides 

A sublibrary was constructed for each of the 
5 previously described degenerate populations of 
oligonucleotides. The left half population of 
oligonucleotides was incorporated into M13ED03 to 
generate the sublibrary M13ED03.L and the right half 
population of oligonucleotides was incorporated into 
10 M13IX421 to generate the sublibrary M13IX421.R. Each of 
the oligonucleotide populations were incorporated into 
their respective vectors using site-directed mutagenesis 
as described in Example I. Briefly, the nucleotide 
sequences flanking the degenerate codon sequences were 
15 complementary to the vector at the site of incorporation. 
The populations of nucleotides were hybridized to single- 
stranded M13ED03 or Mi3IX421 vectors and extended with T4 
DNA polymerase to generate a double-stranded circular 
vector. Mutant templates were obtained by uridine 
20 selection in vivo , as described by Kunkel at al., supra . 
Each of the vector populations were electroporated into 
host cells and propagated as described in Example I- 



The random joining of right and left half 
sublibraries into a single surface expression library was 

25 accomplished as described in Example I except that prior 
to digesting each vector population with Fok I they were 
first digested with an enzyme that cuts in the unwanted 
portion of each vector. Briefly, M13ED03.L was digested 
with Bgl II (cuts at 7094) and M13IX421.R was digested 

30 with Hind III (cuts at 3919). Each of the digested 
populations were further treated with alkaline 
phosphatase to ensure that the ends would not religate 
and then digested with an excess of Fok I. Ligations, 
electroporation and propagation of the resultant library 
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was performed as described in Example I. 

The surface expression library wes screened for 
ligand binding proteins using a modified panning 
5 procedure. Briefly, 1 ml of the library, about lo'^ phage 
particles, was added to 1-5 of the ligand binding 
protein. The ligand binding protein was either an 
antibody or receptor globulin (Rg) molecule, Aruffo et 
al., Cell 61:1303-1313 (1990), which is incorporated 

10 herein by reference. Phage were incubated shaking with 
affinity ligand at room temperature for 1 to 3 hours 
followed by the addition of 200 ^1 of latex beads 
(Biosite, San Diego, CA) which were coated with goat- 
antimouse IgG. This mixture was incubated shaking for an 

15 additional 1-2 hours at room temperature. Beads were 
pelleted for 2 minutes by centrifugation in a microfuge 
and washed with TBS which can contain 0.1% Tween 20. 
Three additional washes were performed where the last 
wash did not contain any Tween 20. The bound phage were 

20 then eluted with 200 ^1 0.1 M Glycine-HCl, pH 2.2 for 15 
minutes and the beads were spun down by centrifugation. 
The supernatant-containing phage (eluate) was removed and 
phage exhibiting binding to the ligand binding protein 
were further enriched by one-to-two more cycles of 

25 panning. Typical yields after the first eluate were 

about 1 x lo' - 5 X lo' pfu. The second and third eluate 
generally yielded about 5 x 10* - 2 x 10^ pfu and 5 x 
10^ - 1 X 10^° pfu, respectively. 

The second or third eluate was plated at a suitable 
3 0 density for plaque identification screening and 

sequencing of positive clones (i.e., plated at confluency 
for rare clones and 200-500 plaques/plate if pure plaques 
were needed) . Briefly, plaques grown for about 6 hours 
at 37 'C and were overlaid with nitrocellulose filters 
3 5 that had been soaked in 2 mM IPTG and then briefly dried. 
The filters remained on the plaques overnight at room 
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temperature, removed and placed in blocking solution for 
1-2 hours. Following blocking, the filters were 
incubated in 1 ng/ml ligand binding protein in blocking 
solution for 1-2 hours it roon temperature. Goat 
5 antiinouse Ig-coupled alkaline phosphatase (Fisher) was 
added at a 1:1000 dilution and the filters were rapidly 
washed with 10 mis of TBS or block solution over a glass 
vacuum filter. Positive plaques were identified after 
alkaline phosphatase development for detection. 

10 screening of the degenerate oligonucleotide library 

with several different ligand binding proteins resulted 
in the identification of peptide sequences which bound to 
each of the ligands. For example, screening with an 
antibody to 6-endorphin resulted in the detection of 

15 about 30-40 different clones which essentially all had 
the core amino acid sequence known to interact with the 
antibody. The sequences flanking the core sequences were 
different showing that they were independently derived 
and not duplicates of the same clone. Screening with an 

20 antibody known as 57 gave similar results (i.e., a core 
consensus sequence was identified but the flanking 
sequences among the clones were different) . 

EXAMPLE IV 

r.^^o^.f^nr. of a Le f i- H;,lf Random oUqonucleot;de Library 

2 5 This example shows the synthesis and construction of 

a left half random oligonucleotide library. 

A population of random oligonucleotides nine codons 
in length was synthesized as described in Example I 
except that different sequences at their 5' and 3 • ends 
30 were synthesized so that they could be easily inserted 
into the vector by mutagenesis. Also, the mixing and 
dividing steps for generating random distributions of 
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reaction products was performed by the alternative methcd 
of dispensing equal volumes of bead suspensions. The 
liquid chosen that was dense enough for the beads to 
remain dispersed was iOO% acetonitrile . 

5 Briefly, each column was prepared for the first 

coupling reaction by suspending 22 mg (l^mole) of 48 
Mmol/g capacity beads (Genta, San Diego, CA) in 0.5 mis 
of 100% acetonitrile. These beads are smaller than those 
described in Example I and are derivatized with a guanine 

10 nucleotide. They also do not have a controlled pore 
size- The bead suspension was then transferred to an 
empty reaction column. Suspensions were kept relatively 
dispersed by gently pipetting the suspension during 
transfer. Columns were plugged and monomer coupling 

15 reactions were performed as shov.Ti in Table XII. 

Table XII 
Sequence 

Column _(5' to 3') 

column IL AA (A/C) GGCTTTTGCCACAGG 

20 column 2L AG (A/ G) GGCTTTTGCCACAGG 

column 3L AT (A/ G) GGCTTTTGCCACAGG 

column 4L AC (A/ G) GGCTTTTGCCACAGG 

column 5L CA(G/T) GGCTTTTGCCACAGG 

column 6L CT(G/C) GGCTTTTGCCACAGG 

2 5 column 7L AG (T/C) GGCTTTTGCCACAGG 

column 8L AT (T/C) GGCTTTTGCCACAGG 

column 9L CC (A/C) GGCTTTTGCCACAGG 

column lOL T (A/T) TGGCTTTTGCCACAGG 

After coupling of the last monomer, the columns were 
3 0 unplugged as described previously and their contents were 
poured into a 1.5 ml microfuge tube. The columns were 
rinsed with 10 0% acetonitrile to recover any remaining 
beads. The volume used for rinsing was determined so 
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10 



15 



20 



25 



that the final volume of total bead suspension was about 
100 Ml for each new reaction colunn that the beads would 
be aliquoted into. The mixture was vortexed gently to 
produce a uniformly dispersed suspension and then 
divided, with constant pipetting of the mixture, into 
equal volumes. Each mixture of beads was then 
transferred to an empty reaction column. The empty tubes 
were washed with a small volm.e of 100% acetonitrile and 
also transferred to their respective columns. Random 
codon positions 2 through 9 were then synthesized as 
described in Example I where the mixing and dividing 
steps were performed using a suspension in 100% 
acetonitrile. The coupling reactions for codon positions 
2 through 9 are shown in Table XIII. 



30 



Table XIII 



Sequence 

ColuTTin f5' to ?' ) 



column IL AA(A/C)A 

column 2L AG(A/G)h 

column 3L AT(A/G)& 

column 4L AC(A/G)£^ 

column 5L CA(G/T)^ 

column 6L CT(G/C)A 

column 7L AG(T/C)A 

column 8L AT(T/C)^ 

column 9L CC(k/C)h 

column lOL T(A/T)T^ 



After coupling of the last monomer for the ninth 
codon position, the reaction products were mixed and a 
portion was transferred to an empty reaction column, 
columns were plugged and the following monomer coupling 
reactions were performed: 5 • -CGGATGCCTC..GAAGCCCCXXA-3 
(SEQ ID no: 60). The resulting population of random 
oligonucleotides was purified and incorporated by 
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mutagenesis into the left half vector M13ED04. 

M13ED04 is a modified version of the M13ED03 vector 
described in Example III and therefore contains all the 
features of that vector. The difference between M13ED03 
5 and M13ED04 is that M13ED04 does not contain the five 
amino acid sequence (Tyr Gly Gly Phe Met) recognized by 
anti-6-endorphin antibody. This sequence was deleted by 
mutagenesis using the oligonucleotide 5'- 
CGGATGCCTCAGAAGGGCTTTTGCCACAGG (SEQ ID NO: 61). The 
10 entire nucleotide sequence of this vector is shown in 
Figure 10 (SEQ ID NO: 6) . 

Although the invention has been described with 
reference to the presently preferred embodiment, it 
should be understood that various modifications can be 
15 made without departing from the spirit of the invention. 
Accordingly, the invention is limited only by the claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) P.FFLlCAliT: Huse, William D. 

fix) TITLE OF IN-/ENTICN; SL-RFACE EXPRESSION LIBRARIES OF 
RANDOMIZED PEPTIDES 

(iii) NUMBER OF SEQUENCES: 61 

(ivO CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Pretty, Schroeder. Brueggemann 6 lld 

(B) STREET: 444 South Flower Street. Suite 2000 

(C) CITY: Los Angeles 

(D) STATE; California 

(E) COUNTRY: United States 

(F) ZIP: 90071 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
CB) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS - DOS 

(D) SOFTWARE: Patentin Release #1.0, Version *?1.23 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAKE: Campbell, Cathrvn A 

(B) REGISTRATION xNUMBER: j^1,815 

(C) REFERENCE/DOCKET NUMBER: P31 9072 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (619) 535-9001 

(B) TELEFAX: (619) 535-8949 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 



AATGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA ATGGTCAAAC TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA CTTCCAGACA CCGTACTTTA 


180 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC AGCAATTAAG CTCTAAGCCA 


2^.0 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG TACTCTCTAA TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA TTAAAACGCG ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT TTGCTTCTGA CTATAATAGT 


^20 
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CAGGGTAAAG ACCTGATTTT TGAT.TATGG TCATTCTCGT TTTCTGAACT G..iA-^^GCA 
TTTGAGGGGG ATTCAATG/.^ TATTIATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 
AAACATrTTA CTArTACCCC CTCTGGCAAA ACrTCTTTTG C.^AAAGCCTC TCGCTATTTT 
GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 
AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 
ATGAATCTTT CTACGTGTAA TAATGTTGTT CCGITAGTTC GTTTTATTAA CGTAGATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 
caatga™ AGTTGAAATT AAACCATCTC AAGCCCAATT TAGTACTCGT TCTGGTGTTT 
CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGGTrTG TTACGTTGAT TIGGGTA-^TG 
AATATCCGGT TCTTGTCAAG ATTACTCITG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 
TGTACACCGT TCATCTGTGG TCTTTCAAAG rTGGTCAGTT CGGTTCCCTT ATGATTGACG 
GTCTGCGCCT GGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 
CAGGCGATGA TACAAATGTG CGTTGTACTT TGTTrCGCGC TIGGTATAAT CGCTGGGGGT 
CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 
GTGGCATTAC GTATTTTACC CGTTrAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGGCT TTA.^CTGCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGnGTIG TCAriGTCGG CGCAACTATC GGTATCAAGC TGTrTAAGA.. 
ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGGCTTn 
TTTTTGGAGA TTTTGAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 
TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 
TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 
CTGTGGAATG CTACAGGCGT TGTAGTTTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGTrCCTA TTGGGCTrGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCXC CTGAGTACGG TGATACACGT 
ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTGTCAGC CTCTTAATAC TTTCATGTTT 
CAGAATAATA GGTIGCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 
TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 
GATCCATTCG TTIGIGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTrCCG GTGGTGGCTC TGGTTCGGGT 
GATTnGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 
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GAAAACGCGC TACAGTCTGA CGCTAAAGGC A.^_^CrrGATT CrGTCGCTAC TGATTACGGT 252 

GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CT.A.ATGGTAA TGGTGCTACT 258 

GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTC.i.^GTCG GTGACGGTGA TAATTCACCT 264 

TTAATGAATA ATTTCCGTCA ATATTTACCT TCCClCCCTC AATCGGTIGA ATGTCGCCCT 270 

TrrGTcmA GCGCTGGTA.A ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 27 6L 

TTCCGTGGTG TCTTTGCGTT TCTTTTATAT GTTGCCACCT TTATGTATGT ATITTCTACG 2S2C 

TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTTTTG GGTATTCCGT 2880 

TATTATTGCG TTTCCTCGGT TTCCTTCTGG TAACTTTGTT CGGCTATCTG CTTACTTTTC -9-.C 

TTAAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCTTGCT CTTATTATTG 3000 

GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3060 

■ITGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 

TCTCTGTAAA CGCTGCTATT TTCATTTTTG ACGTTAAACA AAAAATCGTT TCTrATTTGG 3180 

ATTGGGATAA ATAATATGGC TGTTTATnT GTA.ACTGGCA AATTAGGCTC TGGAAAGACG 2240 

CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCMCTAAT 3300 

CTTGATTTAA GGCrrCAAAA CCTCCCGCAA GTCGGGAGCT TCGCTAAAAC GCCTCGCGTT 3 360 

CTTAGAATAC CGGATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGCTTGC-^ GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT AaATGCTCGT 3540 

AAATTAGGAT GGGATATTAT CTTCCTTCTT CAGGACTTAT CTATTGTTGA TAAACAGGCC 3600 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 

TTrCTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTrGAGCG TTGGCTTTAT 3780 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3840 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3960 

TGTCrrGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCrTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA ATTTTGTTTT CTTGATGTTT 
GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTGGCCTCT GCGCGATTTT 

GTAACTTGGT ATTCAAAGCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 4380 

ACTGTTACTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT 4440 

GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 4500 
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T-G^CATCAT CTGATAATCA GGAATATGnT 
AATCCAAACA ATCAGGATTA TAlTuAxoA. TTG.C.I ,,.,,,^cT 

^rrrTrm-T CT-CCGC^A.\ ATGATMTGT TACTCAAACT 
rATAATTCCG CTCCTTCTGG TCGTrrCTTT GTTCCbL 

™L™ ..... ....CO. ™.cc.x. 

CX«.TCC.C ™C0 OC.C..«^ ™ 

cxcccc. mccoxxc .ccccxc. ™ 

^cxnc cxocxcocxc xc^c.cxccc .cxcxxcc.o ococxcxxm ^ 

CXCCCXCXO XXXX.XCXXC X=CXCCXCCX XCCXXCCCX. xrm.X ™ 
CCCCX.XC.C XXCCCCC.X. ..C.CX.X .COC.XXC. ..X.XX ™- - 

™CCC TXXC.OCTC. C.CCCTTCX .XCXCXCXXC CCC.C.TOX c™x^ 
.CXCCXCCXO XCCXOCXC. .XCXCCCX OX..XMXC C.XXXC.C.C c^x^^- 

C..XCX.0 CX.XXXCC.X C.CCCXXXXX cacrxocM xcccxcccoc M n ^ 
CXCC.X.XX. cc^ccccc ..X.CXTXC ^oxxcrrcx. cxccccmc ^ x- 

.CXMXCA. C.OT.XXCC X.CAACC=XX MTXTCCCXO .XCC.CACAC XCXTXT.CXC 

Lccxc. cxc.™x. ..C.CXXCX C.C.XXCXC CCCX.CC3XX ccx™ 
^cccxrr. xcocccxccx cxxx.ccxcc cocxcxc.xx c.mc=.==. . - 

X.CCXOCTC. XC-^OCC C«.CX.-= CCCCXCX.OC CCCCC.XX. - 

x.x==xcoxx .cocccACcc XCACC.CX.C .CXXCCC.-.C ccccx^cccc cc accm 
cccxrxcrxc ccxxccrxxc xcccc^coxx cocccocxxx cccooxc.0 

OOGGCICCCX XXACacTXCC CAXXXAGXCC rXXACCCCAC CXCOACCCCA AMMCXXG. 

To XC.X ~x. CXCCCCC.XC ™x.o 

cxxcGAOXcc Acoxxcrrx. axa^ccacx cxxctxccm acxocaac cacxcm^ 

rXCXCCCOC XAXX^C AXXXAXMCC CA^CCCG AXXXCCO.C CACCA^^ 
CACGAXXXrC GCCXCCXCCC CCAAACCACC CXOC.CCCCX XCCXOCMCX « 
CACCCCOXOA AOOCCAAXCA OCXCXXCCCC CXCXGCCXCO XOAAAACA. MCCA-CT 
OCOCCCAAXA OOCAAACCCC CXCXCCCCCC OCOXTOOCCC AXXCAXXAAX CCA^ 
COACAGCrrX CCCCACXCOA AACCCOCCAC XOACCCCAAC OCAAXXAAXO XOACrXAOa 
"0 CCACCCCAOO CXXXACACXX XAXCCXXCCC oaCOTAXCX XOXOXCCAAX 
rr=OA XAACAA^C ACACACOAAA CAOCXAX.AC CACOAXOXAC OAA^ 
CXACGAOACC XCGGCGGAXC CXAGGCXOAA CGCGAXGAGC GTGCXAAGGC XGGAXTCAAX 
A^GC CAAGXGGXAC XCACXACAXX CGCXACGCXX CGGGXAXOCX AGX^X.^ 
G ™GCXA CGAXACCAX XAAAXXAXTG AAAAAGXXXA GGAGCAACGC 
GCXGGCGXAA XAGGGAAGAG GCCCGCACCC ATGCGCCXXG GCAACA ™ — 
.XGCCGAAXG GCGGXTTCCC XGGXTTCCGC CACCAGAAGC GCXGCCCGAA AGGXGGaGC 
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AGTGCGATCT TCCTGAGGCC GATACGGTCG TCGTCCCCTC AAACTGGCAG ATCCACGGTT 6600 
ACGATGCGCC CATCTACACC AACGTAACCT ATCCCArTAC GGTCAATCCG CCGTTTGTTC 6660 
CCACGGAGAA TCCGACGGGT TGTTACTCGC TCACATTTAA TGTTGATGAA AGCTGGCTAC 6720 
AGGAAGGCa\ GACGCGAATT ATTrTGATG GCGTTCCTAT TGGTTAAAAA ATGAGCTGAT 6780 
TTAACAAAAA TTTAACGCGA ATTTTAACAA AATATTAACG TTTACAATTT AAATATTTGC 6840 
TTATACAATC TTCCTGTTTT TGGGGCTTTT CTGATTATCA ACCGGGGTAC ATATGATTGA 6900 

CATGCTAGTT TTACGATTAC CGTTCATCGA TTCTCTTGTT TGCTCCAGAC TCTCAGGCAA 6960 

TGACCTGATA GCCTTTGTAG ATCTCTCAAA AATAGCTACC CTCTCCGGCA TTAATTTATC 7020 

AGCTAGAACG GTTGAATATC ATATTGATGG TGATTTGACT GTCTCCGGCC TTTCTCACCC 7080 

TTTTGAATCT TTACCTACAC ATTACTCAGG CATTGCATTT AAAATATATG AGGGTTCTAA 7140 

AAATTTTTAT CCTTGCGTTG AAATAAAGGC TTCTCCCGCA AAAGTATTAC AGGGTCATAA 7200 

TGTTTTTGGT ACAACCGATT TAGCTTTATG CTCTGAGGCT TTATTGCTTA ATTTTGCTAA 7260 

TTCTTTGCCT TGCCTGTATG ATTTATTGGA CGTT -^294 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7320 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2; 
AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAAAAT 60 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCGCnT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 360 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 

CAGGGTAAAG ACCTGATITT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTIGAGGGGG ATICAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GG TliliA TC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 
AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 
ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 
TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTIA AAATCGCATA AGGTAATTCA 840 
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CAATGATTA.. AGrrGAA..TT AA. CCATCTC A..GCCC.V..TT TACTACTCGT TCTGGTGTTT 
CTCGTCAGGG C.AAGCCTTAT TCACTGMTG AGCAGCTTTG TTAGGTTGAT TTGGGTAATG 
AATATCCGGT TCrrGTC.^.AG ATTACTCTTG ATGMGGTCA GCCAGCCTAT GCGCCTGGTG 
TGTACACCGT TCATCTGTCC TCrTTCA-AAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 
GTCTGCGCCT CGTTCCGGCT A..GTA.ACATG GAGCAGGTCG CGGATTTCGA CACA-ATTTAT 
CAGGCGATGA TACAAATCTC CGTTGTACTT TGrTTCGCGC TTGGTATAAT CGCTGGGGGT 
GAAAGATGAG TGTTTTAGTG TATTCTTrCG GCTCT-CGT TTTAGGTTGG TGCGTTGGTA 
GTGGCATTAC GTATTTTACC CGTriAATGG AAACnCCTC ATGAAAAAGT GTTTAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TGCGATGCTG TCrTTCGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGCCT TTA..CTCCCT GCAAGCCTCA GCGACCGA..T ATATCGGTTA 
TGCGTGGGCG ATGGTTGrTG TCATTGTCGG CGCA..CTATC GGTATCAAGC TGTTTAAGAA 
ATTCACCTCG AAAGCAAGCT GATA^^L^CCGA TACA-^TTAAA GGCTCCTTTT GGAGCGTm 
TTTTTGGAGA TTTTC.AACGT GA.^^^ATTA TTATTCGCA.^ TTCCTTTAGT TGTTGCTTTC 
TATTCTCACT CCGCTG.A.^.^C TGTTG.^GT TGTTrAGCAA .WCCCCATAC AGAAAATTCA 
TTTACTAAGG TCTGG.^A.AG.-'.. CGAC.^^_^CT TTAGATCGTT ACGCTA-^CTA TGAGGGTTGT 
CTGTGGAATG GTAGACGCGT TCTAGTTTGT AGTCGTGACG AAAGTCAGTG TTACGGTACA 
TGGGTTCCTA TTGGGCtTC:: TATCCCTG.^. AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTA^^^GCTC CTGAGTACGG TGATACACGT 
ATTCCGGGCT AIAGTTATAT CA.CC:TCTG GACGGCACTT ATGCGGGTGG TACTGAGGAA 
AAGGCGGGTA ATGGTA..TCC TTCTGTTGAG GAGTGTGAGG GTCTTAATAG rTTGATGTTT 

cagaataata ggttccgaaa taggcagggg gcattaactg tttatagggg cactgttact 

CAAGGCACTG ACCCCGTTAA AAGTrATTAG CAGTACACTC CTGTATCATC AAAAGCCATG 

TATGACGCrrr agtggaacgg taaattcaga gactgcgctt tccattctgg ctttaatgaa 

GATCCATICG mCTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTCTGGIGG TGGrrGTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 
GGCGGTTCTG AGGGTGGCGG GTCTGAGGGA GGCGGTTCCG GTOnGGCTC TGGTTCCGGT 
GATTTTGArr ATGAAAAGAT GGCAAACGGT AATAAGGGGG CTATGACCGA AAATGCCGAT 
GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAG TGATTACGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TGCGGCGTTG CTAATGGTAA TGGTGCTACT 
GGTGATTTTG CTGGCTCTAA. TTCCCAAATG GCTCAAGTGG GTGACGGTGA TAATTCACCT 
TIAATGAATA ATrrCCGTCA ATATTTAGCT TCCCTCGGTC AATCGGTTGA ATGTCGCCCT 

rrrGTCTTTA gcggtggtaa accatatgaa ttttctattg attgtgacaa aataaactta 

TTCCGTGGTG TCTTrGGGTr TCTTTTATAT GTTGGCACCT TTATGIATGI ATTTTCTACG 
^GCTAACA TACTGGGTAA TAAGGAGTGT TAATGATGCG AGrTGTTTTG GGTATTCGGT 
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TATTATTG CG 


TTTCCTCGGT 


TTCCrrCTGG 


TA-^CmCTT 


CGG'^ * ATCTG 


CTTACTTTTC 




TTAAAAAGGG 


CTTCGGTAAG 


ATAGCTATTG 


CTATTTCATT 




CTTATTATTG 




GGCTTAACTC 


AATTCTTGTG 


GGTTATCTCT 


CTGATATTAG 


CGCTCA-MTA 


CCCTCTGACT 


3060 


TTGTTCAGGG 


TGTTCAGrTA 


AriCTCCCGT 


CTAATGCGCT 


tgcctgttt: 


TATGTTATTC 


3120 


TCTCTGTAAA 


GGCTGCTATT 


TTCATTTTTG 


ACGTTAAACA 




TCTTATTTGG 


3180 


ATTGGGATAA 


ATAATATGGC 


TGTTTATTTT 


GTAACTGGCA 


AATTAGGCTC 


TGGA.\AGACG 


3240 


CTCGTTAGCG 


TTGGTAAGAT 


TTAGGATAAA 


ATTGTAGCTG 


GGTGCAAAA7 


AGCAACTAAT 


'him 


CTTGATTTAA 


GGCrrCAAAA 


CCTCCCGCAA 


GTCGGGAGGT 


TCGCTAAAAC 


GCClGGGGTi 


3360 



CTTAGAATAC CGGATAAGCC TTGTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 

TCCTACGATG AAAATAAAAA CGGGTIGCTT GTTCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 

ACCCGTTCTT GGAATGATAA GGAAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 3 540 

AA/.TTAGGAT GGGATATTAT CTTCCrTGTT CAGGACTTAT CTArTGTTGA TAAACAGGCG 3600 

CGTTCTGCAT TAGCTGAACA TGTTCTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 2660 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTGTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3760 

ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TA.^^TTATGAT 3S40 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA ATTAACTAAA ATATATTTGA AAAAGTTTTC TCGCGTTCTT 3960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4080 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATrTATG TACTGTTTCC 4200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCAATT TCTTTATTTC 44A0 

TGTTTrACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA AATGATAATG TTACTCAAAC 4620 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTGTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 
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^^^^^rr-rr- TTCGn"r.'- ATTTTT^ATG GCGATCTTTT 
CCTCACCTCT GTTrTATCTT CTGCTGGiG-- TTCGli.^-- n 

^ ,-.,^r>rT'i rArrC'—C\ AAAATATTGT CTGTGCCACG 
AGGGCTATCA GTTCGCGCAT TAAAG^CT^ TAGCC..--L.. 

TATTCTTACG GTnCAGGTG AGAAGGGTTC TATCTGTGTT GGCGAC.^ATG TCGCmXAT 
TACTGGTCGT GTGACTGGTG AATCTGCC...A TGTAAATA.AT CCATITGAGA CGATTGAGCG 
TCAAAATGTA GGTArTTGGA TGAGCGTTTT TGCTGrTGCA ATGGCTGGCG GTA.ATATTGT 
TGTGGATAn ACGAGGAAGG GGGATAGTTT GAGncrrCT AGTGA:GCA.^ GTGATGTTAT 52SO 
TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTGTTrTACT 
GGGTGGGCTG AGTGATTATA AA^GAGTTG TGAAGArTCT GGCGTACGGT TGCTGTGTA.A 
AATGCCTTTA ATGGGGGTGG TGTTTAGGTG CCGCTCTGAT TCC.VXAGG AAAGCACGTT 
ATACGTGCTC GICAAAGCAA GCATAGTACG CGCCCTGTAG ^GGCGCATTA AGCGCGGGGG 
GTGTGGTGGT TACGCGCAGC GTGACCGGTA CACrTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCTTTCrr CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TGCCCGTGAA GCTGTAAATC 
GGGGGCTCCC rrTAGGGTTC CGATTTAGIG CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ArrrGGGTGA TGGTTGACGT AGTGGGGGAT GGGGCTGATA GACGCnTTTI GGGGCTTTGA 5.60 
CGTTGGAGTC CACGTTGm AATAGTGGAC TGITGriGCA .AACTGGAACA ACAGTCAAGG .320 
GTATGTGGGG GTATTCmT GAXn-ATA^AG GGATTrrGCC GATTICGGAA GCAGCATCA.X ..SO 
ACAGGATTTT CGCCTGCTGG GGCA.A.ACCAC -^GTGGACCGC TTGGTGCAAG TGTGTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGrTGCC CGTCICGCTG GTGAA.AAGAA AAACCACCCT 
GGCGCGGAAT AGGGAAACGG CCTCTCGCGG GGGGT^GGGG GATICATTAA TGCAGGTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TGAGTGATTA GGCAGGCCAG GGTTIACAGT TTATGCnGG GGCTGGTATG TTGTGTGGAA 
TTGTGAGCGG ATAAGAATTT GAGACGGCAA GGAGACAGTG ATAATGAAAT ACCTATTGCC 
TACGGCAGCC GGTGGATIGT TATTACTGGC TGCGCAACCA GCCATGGCCG AGCTCGTGAT 
GACCCAGACT CCAGAATTCC ATGGGGAATG AGTGTTAATT CTAGAACGGG TAAGCTIGGC 
ACTGGCCGTC GTmACAAC GTCGTGAGTG GGAAAACCCT GGGGTTACCC AACTTAATCG 
CCTTGCAGCA CACCGCCCTT TGGCCAGCTG GCGTAATAGG GAAGAGGCCC GCACCGATCG 
GCCTTGGCAA GAGTTGGGGA GCGTGAATGG GGAATGGGGC TTTGCGTGGT TTGCGGGACG 
AGAAGGGGTG GGGGAAAGCT GGGTGGAGTG GGATCTTCGT GAGGCGGATA GGGTCGTGGT 
GGGGTGAAAG TGGGAGATGC ACGGTTAGGA TGGGCCGATC TACACCAAGG TAACCTATCC 
GATTAGGGTC AATGGGGCGT TTGrrGGCAC GGAGA.ATCCC ACGGGTTGTr ACTCGCTCAC 
ATTTAATGTT GATGAAAGGT GGCTACAGGA AGGGCAGAGG CGAATTATTT TTGAIGGCGT 6780 
TCCTATTGGT TAAAAAATGA GCTGATTTAA CAAAAATTTA ACGCGAATTT TAACAAAATA 
rrAAGGTTTA GAAriTAAAT ATTIGGTIAT AGAATGTTGC TGmTTGGG GCTTTTCTGA 
TTATCAACCG GGGTACATAT GATTCACATG CTAGTnTAC GArrACCGTT CATCGATTCT 
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CTTGTTTGCT CCAGACTCTC AGGCAATGAC CTGATAGCCT TTGTAGATCT CTCAAAAATA 
GCTACCCTCT CCGGCATTAA TTTATCAGCT AGA.\CGGTTG /^.MATCATAT TGATGGTGAT 
TTGACTGTCT CCGGCCTTTC TCACCCTTTT GAATCTTTAC CTACACATTA CTCAGGCATT 
GCATTTAAAA TATATGAGGG TTCTAAAAAT TTTTATCCTT GCGTTGAAAT AAAGGCTTCT 
CCCGCAAAAG TATTACAGGG TCATAATGTT TTTCGTACAA CCGATTTAGC TTTATGCTCT 
GAGGCTTTAT TGCTTAATTT TGCTAATTGT TTGCCTTGCC TGTATGATTT ATTGGACGTT 

(2) INFORJl^TION FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 7445 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAATGAA.\AT 6 J 

ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 

CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 

GTTGCATATT T.AAAACATGT TGAGCTACAG CACCAGATTC AGCAATTA.AG CTCTAAGCCA 240 

TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 

TTGGAGTTTG CTTCCGGTCT GGTTCG CTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 3 60 

TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA GTATAATAGT 420 

CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 480 

TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 540 

AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGT7ATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGCTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTrCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
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GTGGCATTAC GTATriTACC CGTT7.AATGG .^CTrCCIC ATG-^^.GT CmAGTCCT 
CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTr TCTTICGCTG CTGAGGGTGA 
CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 
TGCGTGGGCG ATGGTTGTTG TCArTGTCGG CGC.A.ACTATC GGTATCAAGC TGTITAAGAA 
ATTCACCTCG AAAGCAAGCT GAT.^^CCGA TACA-ATTAA.. GGCTCCTTTT GGAGCCTTTT 
TTrrrGGAGA XmCAACGT G.AA.VAA^TTA rrAnCGCA.A TrCCTTTAGT TGrrCCITTC 
TATTCTCACT CCGCTGAA.AC TGTTG.^AGT TGrrrAGCA.X .A.ACCCCATAC AG-_AATTCA 
TTTACTA^ACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGriGT 
CTGTGGAATG CTACAGGCGT TGTAGITTGT ACTGGTGACG AAACTCAGTG TTACGGTACA 
TGGGTTCCTA TTGGGCTTGC TATCCCTGA.A AATGAGGGTG GTGGCTCTGA GGGTGGCGGT 
TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 
ATTCCGGGCT ATACTTATAT CA-ACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1^8C 
AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCrTAATAC TnCATGTTT 
CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG riTATACGGG CACTGTTACT 
CAAGGCACTG ACCCCGTTAA AACTTATTAG CAGTACACTC GTGTATCATC AAAAGCCATG 
TATGACGCTT ACTGGAACGG T.AAATTCAGA GACTGCGCTT TCCArTCTGG CTTTAATGAA 
GATCCATTCG TTTGTGAATA TCA.AGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 
GCTGGCGGCG GCTGTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CLnGAGGGT 2340 
GGCGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGTTCCG GTGGTGGCTC TGGTTCCGGT 
GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACCGA AAATGCCGAT 
GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CTGTCGCTAC TGATTAGGGT 
GCTGCTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAAXGGTAA TGGTGCTACT 
GGTGATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 
TTAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGXTGA ATGTCGCCCT 
TTTGTCTrrA GCGCTGGTAA ACCATATGAA TTTTCTATTG AriGTCACAA AATAAACTTA 
TTCCGTGGTG TCTTTGCGTT TCrrrTATAT GTrCCCACCT nATGTATGT ATTTrCTACG 
TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCrrrrG GGTATTCCGT 
TATTATTGCG TTTCCTCGGT TTCCTrCIGG TAACTTTGTT CGGCTATCTG CTTACTTTTC 
^AAAAAGGG CTTCGGTAAG ATAGCTATTG CTATTTCATT GTncrrGCT CnATTATTG 
GGCTTAACTC AATTCTTGTG GGTTATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 
TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTrr TATGTTATTC 
TCTCTGTAAA GGCTGCTATT TTCArrrrTG ACGTTAAACA AAAAATCCrT TCnATTIGG 
^TTGGGATAA ATAATATGGC TGTTTATTrr GTAACTGGCA AATrAGGGTC TGGAAAGACG 
CTCGTTAGCG TTGGTAAGAT TCAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 
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CTTGATTTA.^ GGQ-ChAAA CCTCCCGCAA GTCGGGAGGT TCGCTAAAAC GCCTCGCGTT 
CTTAGAATAC CGGAT.AAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 
TCCTACGATG AAAATAAA-AA CGGCT7GCTT GTTCTCGATG AGTGCCGTAC TTGGTTTAAT 
ACCCGTTCTT GGAATGATAA GC.AAAGACAG CCGATTATTG ATTGGTTTCT ACATGCTCGT 
AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TA.A.ACAGGCG 
CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 
TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 
GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 
ACTGGTAAGA ATTTGTATA.A CGCATATGAT ACTAAACAGG CmTTCTAG TAATTATGAT 
TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 
AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 
TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 
GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 
CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGCGAAA ATTAATTAAT 
AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 
ATTAAAAAAG GTAATTC.AA.A TGAAATTGTT AAATGTAATT AATTTTCTTT TCTTGATGTT 
TcrrrcATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCGTC tgcgcgattt 
TGTAACTTGG TATTCAAACC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 

tactgttact gtatattcat ctgacgttaa acctgaaaat ctacgcaatt tctttatttc 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 

tgataattcc gctccttctg GIGGTITCTT tgttccgcaa aatgataatg ttactcaaac 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 

gtctaatact tctaaatcct caaatgtatt atctattgac ggctctaatc tattagttgt 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTr CAGCAAGGTG ATGCTTTAGA 
TTTrrCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTrTTATCTT CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCGACG 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT GGCCAGAATG TCCCTTTTAT 
TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 
TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 
TCTGGATATT ACCAGCAAGG CCGATAGTTr GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 
TACTAATCAA AGAAG7ATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 
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CGGTGCCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 
AATCCCrrXA ATCGGCCTCC TGTTTAGCTC CGGGTCTGAT TCCA.^CGAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 
GTGTGGTGGT TACGCGCAGC GIGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCrrTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
GGGGGCTCCC TrTAGGGTrC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCnTGA 
CGTTGGAGTC CACGTTCTTr AATAGTGGAC TCTTGTTCCA A.VCTGGAACA ACACTCAACC 
CTATCTCGGG CTATTCTITr GATTTATAAG GGATTTTGCC GA^rTCGGA.^ CCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGAGCGC TTGCTGCAAC TCTCTCAGGG 
CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTC GTGAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGT^GGCC GATTCATTAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCA.--. CGCAATTAAT GTGAGTTAGC 
TCACTCATIA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAACAATTT CACACGCGTC ACTTGGCACT GGCCGTCGTT TTACAACGTC 
GTGACTGGGA AAACCCTGGC GTTACCCAAG CTTTGTACAT GGAGAAAATA AAGTGAAACA 6300 
AAGCACTATI GCACTGGCAC TCTTACCGTr ^CCGTTAGTG TTTACCCCTG TGACAAAAGC 
CGCCCAGGTC CAGCTGCTCG AGTCAGGCCT ATTGTGCCCA GGGGATTGTA CTAGTGGATC 
CTAGGCTGAA GGCGATGACC CTGCTAAGGC TGCATTCAAT AGTTTACAGG CAAGTGCTAC 
TGAGTACATT GGCTACGCTT GGGCTATGGT AGTAGTTATA GTTGGTGCTA CCATAGGGAT 
TAAATTATTC AAAAAGTTTA CGAGCAAGGC TTCTTAAGCA ATAGCGAAGA GGCCCGCACC 
GATCGCCCTT CCCAACAGn GCGCAGCCTG AATGGCGAAT GGCGCTTTGC CTGGTTTCCG 
GCACCAGAAG CGGTGCCGGA AAGCTGGCTG GAGTGCGATC TTCCTGAGGC CGATACGGTC 
GTCGTCCCCT CAAACTGGCA GATGCACGGT TACGATGCGC CCATCTACAC CAACGTAACC 
TATCCCATTA CGGTCAATCC GCCGTTrGTT CCCACGGAGA ATCCGACGGG TTGTTACTCG 
CTCACATrrA ATGTIGATGA AAGCTGGCTA CAGGAAGGCC AGACGCGAAT TATTTrTGAT 
GGCGTTCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAA.^ ATTTAACGCG AATTTTAACA 
AAATATTAAC GTrTACAATT TAAATATTTG CTTATACAAT CTrCCTGTTT TTGGGGCTTT 
TCTGATTATC AACCGGGGTA CATATGATTG ACATGCTAGT TTTACGATTA CCGTTCATCG 
ATTCTCrrGT TTGCTCCAGA CTCTCAGGCA ATGaCCTGAT AGCCTTTGTA GATCTCTCAA 
AAATAGCTAC CCTCTCCGGC ATTAATTTAT CAGCTAGAAC GGTTGAATAT CATATTGATG 
GTGATriGAC TGTCTCCGGC CTTrCTCACC CTTTTGAATC rTTACCTACA CATTACTCAG 
GCArrGCATT TAAAATATAT GAGGGTTCTA AAAATTTTrA TCCTTGCGTT GAAATAAAGG 
CTTCTCCCGC AAAAGTATTA CAGGGTCATA ATGTTTTrGG TACAACCGAT TTAGCrTTAT 
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GCTCTGAGGC TTTATTGCTT AATTTTGCTA ATTCTTTGCC TTGCCTGTAT GATTTATTGG 74A0 
ACGTT 744f 

(2) INFORMATION FOR SEO ID N0:4: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 7409 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 ; 



AATGCTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AAAIGAAAAT 




ATAGCTAAAC AGGTTATTGA 


CCATTTGCGA AATGTATCT^ 


ATGGTCAAAC TAAATCTACT 


1 on 

^ 4- 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAA.^ 


CrrCCAGACA 


CCGTACTTTA 


^ O 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CrCTAAGCCA 


* ' r\ 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 




TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


J 0 u 


TCTTTCGGGC 


TTCCTCTTAA 


TcrrrrrGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


-420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGAGGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAG 


TATTGGACGC 


TATCCAGTCT 


540 


AAACATTTTA 


CT ATT AC CCC 


CTCTGGCAAA 


ACTTCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


ion 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


780 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCAATT 


TACTACTCGT 


TCTGGTGTTT 


900 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1080 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


CACAATTTAT 


1140 


CAGGCGATGA 


TACAAATCTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGTTTTAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAAGTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGCCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 
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.XTCACCTCG AAAGC.ACCT CATA..^CCC. TACA..n..^^ CCCTCCTTTT CGACCCTTTT 

mriGGAGA rrrrcAAGGT GMA-^^^riA ttattcgca.-.. ttccttiagt TGTTCcrrrc 

TATTCTGACT CGGGTGAAAC XGriGA^.G! TGTTTAGCA-. AACCCCATAC AGAAAATTCA 

^^-r-p.A.PA CGAC^'^CT rrAGATCGTT ACGCTAACTA TGAGGGTTGT 
TTTACTAACG TCTGGAAAGA CGAU'iAi--'i^ J. 

CTOTCCMTC CT.C*C.CCT TCT.CTTTCT A=.CCiaA=C AMCIC.CTO mCCCT.CA 
..OC^CI. TTGCCCITCC TATCCCVG.. GTCCCTCICA GGGTGGCCGT 

ICIGAGGOIG GCGGTTGTC* GCGIGGCGGI ACI^«CGTC GIGAGI.CGG TGATAGACGT 
■nCCGGCGT AIACTTATAT GAACGGTCTC GACGGGACrr ATCCGGGIGG TACTGAGGAA 
^CCCCGCTA ATGCTAAICC TTCTCnGAG CACTCTGAGC CIGTTAATAG TTXCATGTTI 
CAGAAIAATA GGTTCCGAAA lAGGGAGGGG GCATTAAGTG ITIATAGGGG GACTGtTACI 
CAAGGCAGTG ACCCGGTIAA AACITATTAC GAGTAGAGTG CIGTATCATG AAAAGGGATG 
TATGACGCTI AGTGGAACGG TAAATTGAGA GACTGGGGTT TCCAITGIGG GiriAATGAA 
GATGCATICG -IGTGAATA TCAAGGCCA. TGCTCIGACC TGGOTCAAGG TCCTGTCAAT 
.OrCGCGGCG GGTCTGGTGC TGGTTCTGCT GGCGGCTGTG AGGGTGGTGG CTGTGAGGGT 
GGCGGTTGTG ACGGTGGCGG CTGTGAGGGA GGCGGTT GCG GTGGTGGGTG TGGTTGGGGT 
GATTTIGAn ATGAAAAGAT GGCAV.CCCT A«AAGGGGG CTATGAGGGA AAATGGGGAT 
GAAAACGGGC TACAGTGTGA CCCtAAAGGC AAACTIGATI GTGTGGCTAG TGATTAGGGT 
GCIGCTATCG ATGGTTICAT TGGTGACGrT TCCGGCGTrG GTAATGGIAA TGGTGGTAGI 
GGTGATTITG CICGCTGTAA TTCCCAAATG GCTCAAGTCG GTGAGGGIGA TAATTGACGT 
XTAATGAATA ATTTCGGTGA ATAITIAGCT IGCGTGGCTG AATGGGTtGA ATGIGOGCGT 
TITGTGTrrA GGGCTGGTAA ACCATATGAA TnTGTATTG ATTGIGACAA AATAAACTTA 
riGCGIGGTG TGrrTGCGTI TGTTTTATAT GTtGGGACGT TIATGTATGT ATmGTACG 
TrrCGTAAGA TAaGCGTA.^ TAAGGAGTCT TAATGATGGG AGTrcntTTG CGTATIGCGT 
TATTATIGCG TTrGCTGGGT TIGCTIGTGG TAAGTTIGTr CGGGTATGIG GTIAGTrnG 
XTAAAAAGGG CXTGGGTAAG ATAGGTATTG GTATTTCATr GTTTGrrOGT CTtATIATTG 
GGOTAAGTG AATIGTIGTG GGTTATGTGT CIGATATOG GGGTGAATTA GGCTGTGACT 
TIGTIGAGGG TGrTGAGTIA ATtCTCGGGT GTAATGCGGT TGCCTGTTIT TATGTTATTG 
TGTGTOTAAA GCCTGCTATT rrCATTTTTG AGGTIAAAGA AAAAATGGTT TCTTATTtOG 
ATAATATGGC TGTTTATrTT GTAAGTGGGA AATTAOGGTC TGGAAAGAGO 
CTGGTTAGGG TIGOTAAGAt TGAGGATAAA ArTGTAGCTG GG.GOAAAAT AGCAAGTAA.T 

crrGA-rrrAA gggttgaaaa ggtgcgggaa gtgcggaggt tgggtaaaag ggctggggtt 

CTIAGAATAG GGGATAAGGC TtGTAIATGI GATrrGGTTG GTATTGOGCG GGGTAATGAT 
TGCTAGGATG AAAATAAAA.A GGGCTIGGTI GTrGTGGATG AGTGGGGTAC TTGGrrtAAT 
.GGGGTIGTr GGAATGATAA GGAAAGACAG GCGATTATIG ATTGGTTICT ACATGGICGI 
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AAATTAGGAT GGGATATTAT TTTTCTTGTT CAGGACTTAT CTATTGTTGA TAA^^CAGGCG 360C 

CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 

TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 

GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3 780 

ACTGGTAAGA ATTTCTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3-40 

TCCGGTGTTT ATTCTTATTT AACGCCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3500 

AATITAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AAAAGTTTTC ACGCGTTCTT 3950 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATATAACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTCACTAT TGACTCTTCT 4030 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCAAGGATT CTAAGGGAAA ATTA^TTAAT 41^.0 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC -200 

ATTAAAAAAG GTAATTCAAA TGAAATTGTT AAATGTAATT AATTTTGTTT TCTTGATGTT 4260 

TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAAATGAAT AATTCGCCTC TGCGCGATTT 4320 

TGTAACTTGG TATTCAAAGC AATCAGGCGA ATCCGTTATT GTTTCTCCCG ATGTAAAAGG 4380 

TACTGTTACT GTATATTCAT CTGACGTTAA ACCTGAAAAT CTACGCA.\TT TCTTTATTTC 4^0 

TGTTTTACGT GCTAATAATT TTGATATGGT TGGTTC.\Arr CCTTCCATAA TTCAGAAGTA 4500 

TAATCCAAAC AATCAGGATT ATATTGATGA -^TTGCCATCA TCTGATAATC AGGAATATGA 4560 

TGATAATTCC GCTCCTTCTG GTGGTTTCrr TGTTCCGCAA AATGATAATG TTACTCAAAC 4520 

TTTTAAAATT AATAACGTTC GGGCAAAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 4680 

GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 4740 

TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 4800 

AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 4860 

TTTTTCATTr GCTGCTGGCT CTCAGCGTGG GACTGTTGCA GGCGGTGTTA ATACTGACCG 4920 

CCTCACCTCT GTTTTATCTr CTGCTGGTGG TTCGTTCGGT ATTTTTAATG GCGATGTTTT 4980 

AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT CTGTGCCACG 5040 

TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTI GGCCAGAATG TCCCTTTTAT 5100 

TACTGGTCGT GTGACTGGTG AATCTGCCAA TGTAAATAAT CCATTTCAGA CGATTGAGCG 5160 

TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 5220 

TCTGGATATT ACCAGCAAGG CCGATAGTTT GAGTTCTTCT ACTCAGGCAA GTGATGTTAT 5280 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTrTTACT 5340 

CGGTGGCCTC ACTGATTATA AAAACACTTC TCAAGATTCT GGCGTACCGT TCCTGTCTAA 5400 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 5460 

ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA AGCGCGGCGG 5520 

GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTACCC CCCGC7CCTT 5580 
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TCGCTTTCn CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 5640 
GGGGGCTCCC TTTAGGGTrC CGAmAGTG CTTTACGGCA CCTCGACCCC AA.A.AAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTIGGAGTC CACGTTCrrT AATAGTGGAC TCTrGTTCCA AACTGGAACA ACACTCAACC 
CTATCTCGGG CTATTCmT GATTTATAAG GGATTTTGCC GATTTCGGAA GCACCATCAA 
ACAGGATTTT CGCCTGCTGG GGCAAACCAG CGTGGACCGC TTGCTGCAAC TCTCTCAGGG 
CCAGGCGGTG AAGGGC.A.ATC AGCTGrTGCC CGTCTCGCTG GTGAA.AAGAA A.A.ACCACCCT 
GGCGCGCAAT ACGCAA.ACCG CGTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGG 
ACGACAGGrr TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCAGCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
TTGTGAGCGG ATAAGAATTT CACACGCGTC ACTrGGCACT GGCCGTCGTT TTACAACGTC 
GTGACTGGGA AAACCCTGGC GriACCCAAG CTrTGTACAT GGAGAAAATA AAGTGAAACA 
AAGCACTATT GCACTGGCAC TCrrACCGTT ACTGriTACC CCTGTGGCAA A-AGCCTATGG 
GGGGTTTATG ACTTCTGAGG GATCCGGAGC TGA-.GGCGAT GACCCTGCTA AGGCTGCATT 

CAATAGrrrA caggca.agtg ciactgagta cattggctac gcttgggcta tggtagtagt 

TATAGTIGGT GCTACCATAG GGArrAAATT ATT^AAAAAG TTTACGAGCA AGGCTTCTTA 
AGCAAI^GGG AAGAGGCCGG GACCGAT-G CCnCCGA..C AGTTGGGCAG CCTGAATGGC 
GAATGGCGCT TTGCCTGGTr TCCGGCACCA GAAGCGGTGC GGGAAAGCTG GCTGGAGTGG 
GATCTTCCTG AGGCCGATAC GGTCGTGGTC CCCTCAAACT GGCAGATGCA CGGTTACGAT 
GCGCCCATCT ACAGCAAGGT AAGGTATCGC ATTAGGGTCA ATCCGCCGTT TGTTCCCACG 
GAGAATCCGA CGGGTIGTrA CTGGCTCACA TTTAATGTTG ATGAAAGCTG GGTACAGGAA 
GGCCAGACGC GAATTATTTT TGATGGCGTT CCTATTGGTT AAAAAATGAG CTGATTTAAC 
MAAATITAA CGCGAATTTT AACAAAATAT TAACGTTTAG AATTTAAATA rTTGCTTATA 
CAATCTTCCT GTTTTrGGGG CnTTCTGAT TATCAACCGG GGTACATATG ATTGACATGC 
TAGTrriACG ATTAGCGTrC ATCGATTCTC nGTTrGCTC CAGACTCTGA GGCAATGACC 
TGATAGCCrr TGTAGATCTC TCAAAAATAG CTACCCTCTC CGGCATTAAT TTATGAGCTA 
GAAGGGTTGA ATATGATATT GATGGTGATT TGAGTGTGTC GGGGGTTTGT GAGGCTTTrG 
AATCTTTAGG TACAGATTAC TCAGGCATTG CATrTAAAAT ATATGAGGGT TCTAAAAATT 
TTTATCCTTG CGTIGAAATA AAGGGTTCTC CCGCAAAAGT ATTACAGGGT CATAATCTTT 
TIGGTACAAC GGATTrAGGT TTATGCTCTG AGGCTTTAn GCTTAATTTr GCTAATTCTI 
TGCCTTGCCT GTATGATTTA TTGGACGTT 
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(2) INFORMATION? FOR SEQ ID NO : 3 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : both 

(D) TOPOLOGY': circular 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO ; 5 ; 



.A^^TGCTACTA 


CTATTAGTAG 


AATTGATGCC 


ACCTTTTCAC 


CTCGCGCCCC 


AAATGAAAAT 


60 


ATAGCTAAAC 


AGGTTATTGA 


CCATTTGCGA 


AATGTATCTA 


ATGGTCAAAC 


TAAATCTACT 


120 


CGTTCGCAGA 


ATTGGGAATC 


AACTGTTACA 


TGGAATGAAA 


CTTCCAGACA 


CCGTACTTTA 


ISO 


GTTGCATATT 


TAAAACATGT 


TGAGCTACAG 


CACCAGATTC 


AGCAATTAAG 


CTCTAAGCCA 


240 


TCTGCAAAAA 


TGACCTCTTA 


TCAAAAGGAG 


CAATTAAAGG 


TACTCTCTAA 


TCCTGACCTG 


300 


TTGGAGTTTG 


CTTCCGGTCT 


GGTTCGCTTT 


GAAGCTCGAA 


TTAAAACGCG 


ATATTTGAAG 


360 


TCTTTCGGGC 


TTCCTCTTAA 


TCTTTTTGAT 


GCAATCCGCT 


TTGCTTCTGA 


CTATAATAGT 


420 


CAGGGTAAAG 


ACCTGATTTT 


TGATTTATGG 


TCATTCTCGT 


TTTCTGAACT 


GTTTAAAGCA 


480 


TTTGACGGGG 


ATTCAATGAA 


TATTTATGAC 


GATTCCGCAC 


TATTGGACGC 


TATCCAGTCT 


540 



AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 

GGTTTTTATC GTCGTGTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 660 

AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 

TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 

CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 

CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 

AATATCCGGT TCTTGTCAAG ATTAGTCTTG ATGAAGGTCA GGCAGCCTAT GCGCCTGGTC 1020 

TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 

GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 

CAGGCGATGA TACAAATCTC CGTrGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 

CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 

GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 

CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 

CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 

TGCGTGGGCG ATGGTTGTTG TCATTGTGGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 

ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 

TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 

TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
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rr.r,- ^^' rriG/TCGTT f.CGCTWCTA TGACGGTrCT 
TTTACTAACC TGTGGAAAGA CGACA.^-«, TO&.l ^^.rr'TACA 
C.GXOOAATG C.AGAGGCCX TG.AGXrTGX AG.CG.GACG AAAG.a.CXG 
.GGGTTCCTA TICGCCXTGC TATGCGTCAA A..TGACGCTG CTCGCTCTCA GGOTOG GG 
XCTGACGGIG GCOCTTCTGA GCGXGGCGGT ACTAAACCXC CTOAGXACGG XGATACAGCX 
.XXGGCCGGX AXAGXXAXAX CAAGGCXCXC GAGGGC.GXX AXGGGCCXGG XACXGAGCM 
..GGCCGGXA AXCGXAAXCC XTGTCrTCAG GAGXGXCAGC GIGTXAATAC TTrc.XGXX 
CAGAAXAAXA OGnCCGAAA XAGGGAGGGG GCAXXAAGTG TXTAXACCGG CACXCXXAC 
CAAGGCACXG ACGGGGrTAA AACXIATXAC GAGXACACXG CXCXAXGAXG AAAAGCCATG 
TAXGAGCGXX ACTGGAACGO TAAATTCAGA GAGXGGGCXX XGGAXTCXGG GrXXAAXGM ^ 
GAXCGATXGG XTXGXGA..XA XGAAGGGCAA XGCXOXGACC XGCGXGAAGC XGGXG AA 
OGTGOCGOGC CGXGXGGXCG XGGXXGXGGI GGCGCGXGXG ACGGTGGXGC CTGXGAGGO 
GGGCGXTCXG AGGGXGGCGC CXCXGAGCGA CGCGGTrcGC GXGCIGGGTC XGGTXCCGGX 
GATrrXGATX AXGAAAAGAX GGGAAAGGGX AATAAGGGGG CXAXG.CGGA AAAXGGCGAX 
OAAAACGCGG TAGAGXGXGA CGGXAAAGGC AAACTTGATT GTGXCGGXAG XCAXXAGGGI 
GCXGGXAXCG AXGGXXTCAX XGCXGACGXT TGCGGCGTTC CXA.XGGXA. TGCXGGTA X 
GGXGATTTXG GXGGGXCXAA XXGGCAAAXG GGXGAAGTGG GXGAGGGXGA XAAXTCACGX 
XTAAXGAAXA AriTGGGTGA AXATTXAGGX XCGCXCCCTC AAXCGGXXGA AXGXCGGGCT 
XXTGXCTXTA GGGCtCGTAA ACGATAXGA/. TTrTCXAXTG AXTGTGACAA AAXAAACTTA 
XXCCGXGGXG XCX^GGGXT XCXXTTAXAX GXXGCCAGCX XXAXGXAXCX AXXXX^XACG 
XXTOGXAACA XACXGGGXAA XAAGGAGXCT XAAXCAXCCG AGrroXTTTG GGXATKCGX 
XAXTAXXCGG XTXCGXGG.r XXGCXXGXGG XAACTXXCTX CGGGXATCXG CXTACXm 
^AAAAAGGC GrXCGGXAAG AXAGCXAXXG GXAXXXGATX GTTXGXXGCX GXX™ 
GCGTXAACXC AAXXGXXCXG GCXXAXCXGX CXGAXATXAG CGCXCAAXXA 
XXGTTCAGGG XGXXCAOTXA AXXGXCGCGX GXAAXGCOCT TCCCXGTrrT XATGXTATIC 
XCXGXGXAAA GGCXGCXArX XXCAXXXXXG AGC^AACA AAAAAXCGrX TCrXAmC 
.XTCCGATAA AXAATAXGGG TGTXXArXXT GXAACXGGCA AATIAGGCXC XOGAAAGAGG 
GXCGTXAGGO TXGGXAAOAX XGAGGAXAAA AXXGXAGCXG GGXGGAAAAX AGCAACXA« 
cnOATXIAA GGCTTCAAAA GOXCCCGCAA GXGGGGAGGX TCGCXAAAAG GGCTGOGOXX 
G^^ITc CGOAXAAGCC XXGXATAXCX GArrrGGTXG GXArrOCOCG CGOXAATCA 

CTIAGAAIA .^.^>Tn AGTGCGGXAC TXGCTTtAAX 

TCCIACGATG AAAATAAAAA CGGCXTGGll w,..-- .,„.ctcGX 

.GGCGrxcr: ggaatgaxaa ggaaagaoag gcgaxxa™ atxggth x acaxgctjx 

^^GCAX GCCATArXAX crXGCXTOXX GAGGACTAX GXATrGTTGA TAMG^GG^ 
GCXXGXGGAX XAGGXGAACA XGXXGXtXAX XGXGGXGGXG XGGAGAGAAX XAGXTXAGGX 
xTg GXA CxrXAXArXG XGXXAXXAGX GGGXGGAAAA XGGCXCXGGG XAAAXXAGAX 



1S60 
1920 
1980 
2040 
ClO'J 
2160 
2220 
280 
340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2S20 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
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GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG TTGGCTTTAT 37SO 

ACTGGTAAGA ATTTGTATAA CGCATATGAT AGTAAACAGG CTTTTTGTAG TAATTATGAT 3S-;C 

TCCGGTGTTT ATTCTTATTT AAGGGCTTAT TTATCACACG GTCGGTATTT CAAACCATTA 3900 

AATTTAGGTC AGAAGATGAA GCTTACTAAA ATATATTTGA AA.4>.GTTTTC ACGCGTTCTT 3 960 

TGTCTTGCGA TTGGATTTGC ATCAGCATTT ACATATAGTT ATAT.AACCCA ACCTAAGCCG 4020 

GAGGTTAAAA AGGTAGTCTC TCAGACCTAT GATTTTGATA AATTGACTAT TGACTCTTCT -OSO 

CAGCGTCTTA ATCTAAGCTA TCGCTATGTT TTCA.AGGATT CT.-LAGGGAAA ATTAATTAAT 4140 

AGCGACGATT TACAGAAGCA AGGTTATTCA CTCACATATA TTGATTTATG TACTGTTTCC 4200 

ATTAAAAAGG TAATTCAAAT GAAATTGTTA AATGTAATTA A7TTTGTTTT CTTGATGTTT 4260 

GTTTCATCAT CTTCTTTTGC TCAGGTAATT GAAATGAATA ATTCGCCTCT GCGCGATTTT 4320 

GTAACTTGGT ATTCAA^GCA ATCAGGCGAA TCCGTTATTG TTTCTCCCGA TGTAAAAGGT 4380 

ACTGTTACTG TATATTCATC TGACGTTAAA CCTGAAAATC TACGCAATTT CTTTATTTCT -440 

GTTTTACGTG CTAATAATTT TGATATGGTT GGTTCAATTC CTTCCATTAT TTAGAAGTAT 4 500 

AATCCAAACA ATCAGGATTA TATTGATGAA TTGCCATCAT CTGATAATCA GGAATATGAT 4560 

GATAATTCCG CTCCTTCTGG TGGTTTCTTT GT7CCGC,VA.\ ATGATAATGT TACTCAAACT -^520 

TTTAAAATTA ATAACGTTCG GGCAAAGGAT TTAATACGAG T7GTCGAATT GTTTGTAAAG 46S0 

TCTAATACTT CTAAATCCTC AAATGTAT-' TCTATTGACG GCTCTAATCT ATTAGTTGTT 4740 
AGTGCACCTA AAGATATTTT AGATAACCTT CCTCAATTCC TTTCTACTGT TGATTTGCCA 
ACTGACCAGA TATTGATTGA GGGTTTGATA TTrGAGGTTC AGCAAGGTGA TGCTTTAGAT 

TTTTCATTTG CTGCTGGCTC TCAGCGTGGC ACTGTTGCAG GCGGTGTTAA TACTGACCGC 4920 

CTCACCTCTG TTTTATCTTC TGCTGGTGGT TCGTTCGGTA TTTTTAATGG CGATGTTTTA 4980 

GGGCTATGAG TTCGCGCATT AAAGACTAAT AGCCATTCAA AAATATTGTC TGTGCCACGT 5040 

ATTCTTACGC TTTGAGGTCA GAAGGGTTCT ATCTCTGTTG GCCAGAATGT CCCTTTTATT 5100 

ACTGGTCGTG TGACTGGTGA ATCTGCCAAT GTAAATAATC CATTTCAGAC GATTGAGCGT 5160 

CAAAATGTAG GTATTTCCAT GAGCGTTTTT CCTGTTGCAA TGGCTGGCGG TAATATTGTT 5220 

CTGGATATTA CCAGCAAGGC CGATAGTTTG AGTTCTTCTA CTCAGGCAAG TGATGTTATT 5280 

ACTAATCAAA GAAGTATTGC TACAACGCTT AATTTGCGTG ATGGACAGAC TCTTTTACTC 5340 

GGTGGCCTCA CTGATTATAA AAACACTTCT CAAGATTCTG GCGTACCGTT CCTGTCTAAA 5400 

ATCCCTTTAA TCGGCCTCCT GTTrAGCTCC CGCTCTGATT CCAACGAGGA AAGCACGTTA 5460 

TACGTGCTCG TCAAAGCAAC CATAGTACGC GCCCTGTAGC GGCGCATTAA GCGCGGCGGG 5520 

TGTGGTGGTT ACGCGCAGCG TGACCGCTAC ACTTGCCAGC GCCCTAGCGC CCGCTCGTTT 5580 

CGCTTTCTTC CCTTCCTTTC TCGCCACGTT CGCCGGCTTT CCCCGTCAAG CTCTAAATCG 5640 

GGGGCTCCCT TTAGGGTTCC GATTTAGTGC TTTACGGCAC CTCGACCCCA AAAAACTTGA 5700 

TTTGGGTGAT GGTrCACGTA GTGGGCCATC GCCCTGATAG ACGGTTTTTC GCCCTTTGAC 5760 
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GTTGGAGTCC 


ACGTTCTTTA 


ATAGTGGACT 


CTTGTTCC.AA. 


ACTGGAACA/^ 


CACTCAACCC 


5320 


TATCTCGGGC 


TATTCTTTTG 


ATTTATAAGG 


GATTTTGCCG ATTTCGGAAC 


GACCATCAAA 


5560 


CAGGAITIIC 


GCCTGCTGGG 


GCAAACCAGG 


GTGGACCGCT 


TGCTGCAACT 


GTCTCAGGGC 


5940 


CAGGCGGTGA AGGGCAATCA 


GCTGTTGCCC 


GTCTCGCTGG 


TGAAAAGAAA 


AACCACCCTG 


6000 


GGG CCCAATA 


CGCAAACGGC 


CTCTCCCCGC 


GCGTTGGCCG 


ATTCATTAAT 


GCAGCTGGCA 


6060 


GGACAGGTTT 


CCCGACTGGA 


AAGCGGGCAG 


TGAGCGCAAC 


GC AATTAATG 


TGAGTTAGCT 


6120 


CACTCATTAG 


GCACCCCAGG 


CTTTACACTT 


TATGCTTCCG 


GCTCGTATGT 


TGTGTGGAAT 


6180 


TGTGAGCGGA 


TAACAATTTC 


ACACAGGAAA 


CAGCTATGAC 


CAGGATGTAC 


GAATTCGCAG 


6240 


GTAGGAGAGC 


TCGGCGGATC 


CGAGGCTGAA 


GGCGATGACC 


CTGCTAAGGC 


TGCATTCAAT 


6300 


AGTTTACAGG 


CAAGTGCTAC 


TGAGTAGATT 


GGCTACGCTT 


GGGCTATGGT 


AGTAGTTATA 


6360 


GTTGGTGCTA 


CCATAGGGAT 


TAAATTATTC 


AAAAAGTTTA 


CGAGCAAGGC 


TTCTTAACCA 


6420 


GCTGGCGTAA 


TAGCGAAGAG 


GCCCGCACCG 


ATCGCCCTTC 


CCAAGAGTTG 


CGCAGCCTGA 


6480 


ATGGCGAATG 


GCGCTTTGCC 


TGGTTTCCGG 


CACCAGAAGC 


GGTGCCGGAA 


AGCTGGCTGG 


6540 


AGTGCGATCT TCCTGAGGCC 


GATACGGTCG 


TCGTCCCCTC 


AAACTGGCAG 


ATGCACGGTT 


6600 


ACGATGCGCC 


CATCTACACC 


AACGTAACCT 


ATCCCATTAC 


GGTCAATCCG 


CCGTTTGTTC 


6660 


CCACGGAGAA 


TCCGACGGGT 


TGTTACTCGC 


TCACATTTA/^ 


TGTTGATGAA 


AGCTGGCTAC 


6720 


AGGAAGGCCA 


GACGCGAATT 


ATTTTTGATr 


^CGTTCCTAT 


TGGTTAAAAA 


ATGAGCTGAT 


6780 


TTAACAAAAA 


TTTAACGCGA 


ATTTTAACAA 


AATATTAACG 


TTTACAATTT 


AAATATTTGC 


6840 


TTATAGAATC TTCCTGTTTT 


TGGGGCiTiT 


CTGATTATCA 


ACGGGGGTAC 


ATATGATTGA 


6900 


CATGCTAGTT TTACGATTAC 


CGTTCATCGA TTCTCTTGTT 


TGCTCCAGAC 


TCTCAGGCAA 


6960 


TGACCTGATA GCCTTTGTAG 


ATCTCTCAAA AATAGCTACC 


CTCTCCGGCA 


TTAATTTATC 


7020 


AGCTAGAACG 


GTTGAATATC 


ATATTGATGG 


TGATTTGACT 


GTCTCCGGCC 


TTTCTCACCC 


7080 


TTTTGAATCT TTACGTAGAC 


ATTACTCAGG 


CATTGCATTT 


AAAATATATG 


AGGGTTCTAA 


7140 


AAATiTilAT 


CCTTGCGTrG 


AAATAAAGGC 


TTCTCCCGCA AAAGTATTAC 


AGGGTCATAA 


7200 


TGTTTTTGGT ACAACCGATT 


TAGCTTTATG 


CTCTGAGGCT 


TTATTGCTTA ATTTTGCTAA 


7260 


TTCTTTGCCT TGCCTGTATG 


ATTTATTGGA 


. CGTT 






7294 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7394 base pairs 

(B) TYPE: nucleic acid 

(C) STKANuEDNESS: both 

(D) TOPOLOGY: circular 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CTACTA CTATTAGTAG AATTGATGCC ACCTTTTCAG CTCGCGCCCC AA.ATGAAAAT 
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ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTC.\.-^-nC TAAATCTACT 
CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 
CTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCMTTAAG CTCTAAGCCA 
TCTGCAAAAA TGACCTCTTA TCAAi^\GGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 
TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGGTCGAA TTAAAACGCG ATATTTGAAG 
TCTTTCGGGC TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 
CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCTGAACT GTTTAAAGCA 



TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 


540 


AAACATTTTA 


CTATTACCCC 


CTCTGGCAAA 


. ACnCTTTTG 


CAAAAGCCTC 


TCGCTATTTT 


600 


GGTTTTTATC 


GTCGTCTGGT 


AAACGAGGGT 


TATGATAGTG 


TTGCTCTTAC 


TATGCCTCGT 


660 


AATTCCTTTT 


GGCGTTATGT 


ATCTGCATTA 


GTTGAATGTG 


GTATTCCTAA 


ATCTCAACTG 


720 


ATGAATCTTT 


CTACCTGTAA 


TAATGTTGTT 


CCGTTAGTTC 


GTTTTATTAA 


CGTAGATTTT 


7S0 


TCTTCCCAAC 


GTCCTGACTG 


GTATAATGAG 


CCAGTTCTTA 


AAATCGCATA 


AGGTAATTCA 


840 


CAATGATTAA 


AGTTGAAATT 


AAACCATCTC 


AAGCCCA^^TT 


TACTACTCGT 


TCTGGTGTTT 


500 


CTCGTCAGGG 


CAAGCCTTAT 


TCACTGAATG 


AGCAGCTTTG 


TTACGTTGAT 


TTGGGTAATG 


960 


AATATCCGGT 


TCTTGTCAAG 


ATTACTCTTG 


ATGAAGGTCA 


GCCAGCCTAT 


GCGCCTGGTC 


1020 


TGTACACCGT 


TCATCTGTCC 


TCTTTCAAAG 


TTGGTCAGTT 


CGGTTCCCTT 


ATGATTGACC 


1030 


GTCTGCGCCT 


CGTTCCGGCT 


AAGTAACATG 


GAGCAGGTCG 


CGGATTTCGA 


GACAATTTAT 


1140 


CAGGCGATGA 


TACAAATGTC 


CGTTGTACTT 


TGTTTCGCGC 


TTGGTATAAT 


CGCTGGGGGT 


1200 


CAAAGATGAG 


TGrn'TAGTG 


TATTCTTTCG 


CCTCTTTCGT 


TTTAGGTTGG 


TGCCTTCGTA 


1260 


GTGGCATTAC 


GTATTTTACC 


CGTTTAATGG 


AAACTTCCTC 


ATGAAAAAGT 


CTTTAGTCCT 


1320 


CAAAGCCTCT 


GTAGGCGTTG 


CTACCCTCGT 


TCCGATGCTG 


TCTTTCGCTG 


CTGAGGGTGA 


1380 


CGATCCCGCA 


AAAGCGGCCT 


TTAACTCCCT 


GCAAGCCTCA 


GCGACCGAAT 


ATATCGGTTA 


1440 


TGCGTGGGCG 


ATGGTTGTTG 


TCATTGTCGG 


CGCAACTATC 


GGTATCAAGC 


TGTTTAAGAA 


1500 


ATTCACCTCG 


AAAGCAAGCT 


GATAAACCGA 


TACAATTAAA 


GGCTCCTTTT 


GGAGCCTTTT 


1560 


rnTTGGAGA 


TTTTCAACGT 


GAAAAAATTA 


TTATTCGCAA 


TTCCTTTAGT 


TGTTCCTTTC 


1620 


TATTCTCACT 


CCGCTGAAAC 


TGTTGAAAGT TGTTTAGGAA AACCCCATAC 


AGAAAATTCA 


1680 


TTTACTAACG 


TCTGGAAAGA 


CGACAAAACT 


TTAGATCGTT 


ACGCTAACTA 


TGAGGGTTGT 


1740 


CTGTGGAATG 


CTACAGGCGT 


TGTAGTTTGT 


ACTGGTGACG 


AAACTCAGTG 


TTACGGTACA 


1800 


TGGGTTCCTA 


TTGGGCTTGC TATCCCTGAA AATGAGGGTG 


GTGGCTCTGA 


GGGTGGCGGT 


1860 


TCTGAGGGTG 


GCGGTTCTGA 


GGGTGGCGGT 


ACTAAACCTC 


CTGAGTACGG 


TGATACACCT 


1920 


ATTCCGGGCT 


ATACTTATAT 


CAACCCTCTC 


GACGGCACTT 


ATCCGCCTGG 


TACTGAGCAA 


1980 


AACCCCGCTA 


ATCCTAATCC 


TTCTCTTGAG 


GAGTCTCAGC 


CTCTTAATAC 


TTTCATGTTT 


2040 


CAGAATAATA 


GGTTCCGAAA 


TAGGCAGGGG 


GGATTAACTG 


TTTATACGGG 


CACTGTTACT 


2100 



wo 92/06176 



90 



...^;,TTAC CAGTACAGTC CTGTATCATC AAAAGCCATG 
CAAGGCACTG ACCCCGTTA.-. n^v^^^'v. in^. ^..t-^,. 

.CXCCMCCO .^.T.C.C^ C.CTCC.C.T TCC.TTCTCC C^TO^ 
LcCXTCC XXXCTO^T. TC«>CCCCM TCCTCTC.CC T.CCTC..CC XCCXC.CMT 
LcCCC CCXCTCCX.0 XCCXXCrCCX CCC.CCXCTC .CCOXO=TCC CXCTC.CCCX 
CCC.CTXCXC .OCCXCOCCC CXCXC.COO. CCCCOXTCCO OXCCXCCCXC XOCXXCCCX 
„XXXXGAXX AXO.V.AC.X CGCmCGCX AAXAAGGGCO CXAXGACCGA AAATGCCGAX 
OAAAAGGGGC XACAGTCTGA CCGXAA.GCC AAACXXCAXX CXGXCGCXAC TGAXXACGGT 
GCXCCXAXGC AXGGXXXGAX XGGXGACGXX XCCGGCCXXC CXAAXGGX,. XGGXGCXACX 
CGTCAXX^G CXCGCXGXA, XXCCGAAAXG GCXCAAGXCC GXGACGGXCA TAAX^ - 
rXAATOAAXA ATrXCCGXCA AXAXXXACGX XCCGXCCCXC AAXCGGXXOA AXOXCGCGGX 

rrtGxtnrxA gccciggxaa accaxaxgaa xxtxciaxxg attcxcacaa aaxaaacxxa 

TXCCGXCGXG XCTXTGCGXX XCTXXIAXAX CXXGOCACCX TIAXOXATGX AXTXrGXACG 
XXIGCXAACA XACXGCCXAA XAAGGAGXCX TAAXCATCCC AGrXCXXTIG CGXAXXCCCX 
XATXATTCCG XXXGacGGX XTCCXXCXOC XAAC^GXX CGGCXAXGXG crrACXXTTC 
„^GGG GTXCC^AAG AXAGCXAXTC CXAXTXCAXX CTTXCXTG^X CXXAXrA™ 
CCGTXAACTC AAXXCrTCTC GCXXATCXCT =XCAXAXXA= CGCXCA.A.„ TCXGA^ 
XXCXXCAGGG XCXXCAGTXA AXXCXCCCCX CXAAXGCGG. XCCCXGXTn XAXCTTAXX 

.cxgxgxaaa cocxcgxax: .caxxtttg agoxxaaaca aaaaaxgcxx xcxxa^ 

AXXGCOAXAA AXAAXAXCCC XGrHAXT^ GXAACXOGCA AAXXACGCXG X OA^O^ 

cxcgtxaccg xxggiaagax ttaggaxaaa atxgxaccxg ggtgcaaaax agcaagtaax 

CXXOAXTXAA GOCXXCAAAA CGXCCCCCAA. GXGOGOAGGX XCGGXAAAAG Gcac^G^ 
^.^XAC CGGAXAAGGC XXCXAXAXCX GAXXXGCXXG CXAXXCGCGC CC— X 

xccxaggaxg aaaaxaaaaa GGGcxxGcrr gxxcxccaxg A.XCCGGXAC ^0^2 
.ccGOxrcrr cgaaxoaxaa ccaaacacag ccga™xx= ArrGCXxxcx ac^cc^^ 

^^XTACGAX GGCAXAXXAX XXXXCXXm CACCACXXAX C.AXXOXXCA XA^OGCO 

ggvtgxgcax xaocxgaaga xgxxcxxxax xoxggxggxg xggagagaax xacxtxaccx 

^GXCGCXA GXTXAXAXXC XGrXArXACX CCCXCGAAAA xgggxgxg^ x^rx^ 
OTIGGGGTXC rXAAAXAXGO GGAXXCXGAA XXAAGCGCXA CXCrTGAGCC XXGGCXTTAX 

TgLga axxxgxaxaa gggaxaxgax acxaaaoagg gxxxxxgxao xaax^x^ 
Txxggca xxggax.cc axgagcax. agaxaxagxx axaxaacgga agcxaa.. 

CAGCTXAAAA AGGXAGXGXC XCAGAGCXAX GAXTXXGAXA AArXGA x« x^g^g^- 
GAGGGXOTXA AXGXAAGCXA XGGCXAXGTX XXGAAGGAXX GXAAGGGAAA ArTAATTAAX 



2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 
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AGCGACGATT TACAGAAGCA AGGT7ATTCA CTCACATATA 7TGATTTATG TACTGTTTCC 
ATTAAAAAAG GTAATTCAA^ TGAAATTGTT AAATGTA.ATT AATTTTGTTT TCTTGATGTT 
TGTTTCATCA TCTTCTTTTG CTCAGGTAAT TGAA.ATGAAT AATTCGCCTC TGCGCGATTT 
TGTAACTTGG TATTCAAAGC AA~CAGGCGA ATCCGTTATT GTITCTCCCG ATGTAAAAGG 
TACTGTTACT GTATATTCAT CTGACGTTAA ACCTG.A.AAAT CTACGCAATT TCTTTATTTC 
TGTTTTACGT GCTAATAATT TTGATATGGT TGGrTCAATT CCTTCCATAA TTCAGAAGTA 
TAATCCAAAC AATCAGGATT ATATTGATGA ATTGCCATCA TCTGATAATC AGGAATATGA 
TGATAATTCC GCTCCTTCTG GTGGTTTCTT TGTTCCGCAA A.ATGATAATG TTACTCAAAC 
TTTTAAAATT AATAACGTTC GGGCA.AAGGA TTTAATACGA GTTGTCGAAT TGTTTGTAAA 
GTCTAATACT TCTAAATCCT CAAATGTATT ATCTATTGAC GGCTCTAATC TATTAGTTGT 
TAGTGCACCT AAAGATATTT TAGATAACCT TCCTCAATTC CTTTCTACTG TTGATTTGCC 
AACTGACCAG ATATTGATTG AGGGTTTGAT ATTTGAGGTT CAGCAAGGTG ATGCTTTAGA 
TTTTTCATTT GCTGCTGGCT CTCAGCGTGG CACTGTTGCA GGCGGTGTTA ATACTGACCG 
CCTCACCTCT GTTTTATCTT CTGCTGGTGG TTCGTTCGGT ATTnTAATG GCGATGTTTT 
AGGGCTATCA GTTCGCGCAT TAAAGACTAA TAGCCATTCA AAAATATTGT ctgtgccacg 
TATTCTTACG CTTTCAGGTC AGAAGGGTTC TATCTCTGTT CGCCAGAATG TCCCTTTTAT 

tactggtcgt gtgactggtg aatctgccaa tgtaaataat ccatttcaga cgattgagcg 

TCAAAATGTA GGTATTTCCA TGAGCGTTTT TCCTGTTGCA ATGGCTGGCG GTAATATTGT 

tctggatatt ACCAGCAAGG ccgatagttt gagttcttct actcaggcaa gtgatgttat 

TACTAATCAA AGAAGTATTG CTACAACGGT TAATTTGCGT GATGGACAGA CTCTTTTACT 

cggtggcctc actgattata aaaacacttc tcaagattct ggcgtaccgt tcctgtctaa 

AATCCCTTTA ATCGGCCTCC TGTTTAGCTC CCGCTCTGAT TCCAACGAGG AAAGCACGTT 
ATACGTGCTC GTCAAAGCAA CCATAGTACG CGCCCTGTAG CGGCGCATTA agcgcggcgg 
GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT 
TCGCTTTCTT CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC 
GGGGGCTCCC TTTAGGGTrC CGATTTAGTG CTTTACGGCA CCTCGACCCC AAAAAACTTG 
ATTTGGGTGA TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA 
CGTTGGAGTC CACGTrCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACG 
CTATCTCGGG CTATrCTTTr GATTTATAAG GGATTTTGCC GATTTCGGAA CCACCATCAA 

ACAGGAnrr cgcctgctgg ggcaaaccag cgtggaccgc ttgctgcaac tctgtcaggg 

CCAGGCGGTG AAGGGCAATC AGCTGTTGCC CGTCTCGCTG GTGAAAAGAA AAACCACCCT 
GGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 
ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 
TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 



A200 

4260 

4320 

4380 

4440 

4500 

4360 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 
5880 
5940 
6000 
6060 
6120 
6180 
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^^T-^r^ACT GGCCGTCGTT TTACAACGTC 

— rrc ~i — — ^ 

GTGACTGGGA AAACCCTGGC GTTA CCTGTGGCAA AAGCCCTTCT 

r^r^rTrrCAC TCTTACCGTT ACTGiTTACL. 
MGCACTATI GCACTGGCAC GCAnCAATA GirTACAGGC 

GAGGCATCCG GGAGCTGAAG GCGA.GACCC TGCTAAG 

CAGTAGAX.G GCXACGC.G GGCXATG ~ 

aaa™ aaaagx.. c^^^^^^^^^^^^ „^ 

CCCGCCACCG ATCCGCCTXC CCAAGAGTTG G ^ . • ^^^^^^^^^ 
—0. CAGCAGAAGC GGXGGCGG.A ^^^^ ^^^^^^^^^^ 

— — "CCACGGAGAA XCCGACCGC. 
^CCXAACCT AXCCCATTAC GGXCAA CC^^^^^^ ^^^^^^^^ 

„CCC .CACA..AA XG™ - „^ 

,^...G GCG™ X— ^^^^^ 

.rrrTAACAA aatattaacg TTrACAArrr ,,,cctagtt ttacgattac 

.cGGGCTrrr crcArrATCA accggggtac atatgaxxg . 
cGrrcATCCA rrccxGrr tgcxcgagac xccagg — 

„AAA AATAGCTACC CXGXCCGGGA rXAATTT - 

.„G XGArrXGAGX GXCXCCGGCC rTTCTCACC 

„ CA.GCAX. AAAAXAX.C ACG~ — 

^.^CGC XXGXGCCGGA AAAGXAXXA A^ - ^^^^^^^^^^ 

TAGCTTTAXG CTCXGAGGGX XTAXXGCTTA AXXTTG.XAA 

ATTTATTGGA CGTT 

(2) INTORMAXION FOR SEQ ID N0:7: 

fl^ SEQUENCE CHARACXpiSTICS: 
(A) LENGTH: 37 base pairs 

TYPE: nucleic acid 
(C^ STRANDEDNESS : single 
(D) TOPOLOGY: linear 

Cxi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 
GAXCCTAGGC XGAAGGCGAT GACCCXGCTA AGGCXGC 

(2) INFORMATION FOR SEQ ID NO: 8: 

SEQUENCE CHARACTERISTICS: 
SnGTH: 35 base pairs 
TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : S 
ATTCAATAGT TTACAGGCA.-^ GTGCTACTGA GTACA 



(2) INFORMMION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 
vA) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TTGGCTACGC TTGGGCTATG GTAGTAGTTA TAGTT 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10 
GGTGCTACCA TAGGGATTAA ATTATTC.^AA A.^GTT 



(2) INFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
TACGAGCAAG GCTTCTTA 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 
AGCTTAAGAA GCCTTGCTCG TAAACTTTTT GAATAATTT 



wo 92/06176 



94 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHAR^\CTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOCY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:13: 
AATCCCTATG GTAGCACCAA CTATAACTAC TACCAT 

(2) INFORMATION FOR SEQ ID NO : 14 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGCCCAAGCG TAGCCAATGT ACTCAGTAGC ACTTG 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sins2,le 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCTGTAAACT ATTGAATGCA GCCTTAGCAG GGTC 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16 

^.^rr^^A nnrTdC 

(2) INFORMATION FOR SEQ ID NO -.17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIFTICN: SEQ ID NO: 17; 
CTCGAATTCG TACATCCTGG TCATAGC 



(2) INFORMATION FOR SEQ ID NO: IS: 

(i) SEQUE^;CE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TVTE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOCT: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:18; 
CATTTTTGCA GATGGCTTAG A 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TAGCATTAAC GTCCAATA 



(2) INFORMATION FOR SEQ ID NO: 20: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ATATATTTTA GTAAGCTTCA TCTTCT 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: 
GACAAAGAAC GCGTGAAAAC TTT 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid. 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
GGGGGCCTGT TCGCTATTGC TTAAGAAGCC TTGCT 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 48 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TTCAGCCTAG GATCCGCCGA GCTGTCCTAC CTGCGAATTC GTACATCC 

(2) INFORMATION FOR SEQ ID NO : 24 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pai^o 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: 
TGGATTATAC TTCTAAATAA TGGA 

(2) INFORMATION FOR SEQ ID NO: 25: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
TAACACTCAT TCCGGATGGA ATTCTGGAGT CTGGGT 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pa3.rs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



wo 92/06176 



PCr/LS91/07l41 



97 



(xl) SEQUENCE DESCRIPTION; SEQ ID NO: 26; 
AATTCGCCAA GGAGACAGTC AT 



(2) INFORMATION FOR SEQ ID NO : 2 7 . 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH; 39 base pairs 

(B) TY?E: nucleic acid 

(C) STRANDEDNESS: single 
CD) TOFOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27; 
AATGAAATAC CTATTGCCTA CGGCAGCCGC TGGATTGTT 3 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
ATTACTCGCT GCCCAACCAG CCATGGCCGA GCTCGTGAT 39 



(2) INFORMATION FOR SEQ ID NO: 29; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
GACCCAGACT CCAGATATCC AACAGGAATG AGTGTTAAT 39 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 30: 
TCTAGAACGC GTC 
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(2) INFORM^MION FOR SEQ ID NO: 31: 

SEQUE^^CE CHARACTERISTICS: 

(A) LENGTH: 35 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEOUE.^;CE DESCRIPTION: SEQ ID NO: 31: 
ACGTGACGCG TTCTAGA/.TT A.'^CACTCATT CCTGT 



;2) INFORMATION FOR SEQ ID NO: 32: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGGATATCTG GAGTCTGGGT CATCACGAGC TCGGCCATG 



i2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GCTGGTTGGG CAGCGAGTAA TAACAATCCA GCGGCIGCC 



(2) INFORMATION FOR SEQ ID NO : 34 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:34 
PTArrCAATA CG-^ATTTCAT TATGACTGTC CTTGGCG 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi; SEQUENCE DESCRIPTION: SEC ID NO : 3 
TGACTGTCTC CTTGGCGTGT GAAATTGTTA 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairr. 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
TAACACTCAT TCCGGATGGA ATTCTCGAGT CTGGGT 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY': linear 



(XL) SEQUENCE DESCRIPTION; SEQ ID NO : 3 7 : 
CAATTTTATC CTAAATCTTA CCAAC 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CATTTTTGCA CATGGCTTAG A 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
CGAAAGGGGG GTGTGCTGCA A 
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(2) INFORMATION FOR SF.Q ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNiZSS: single 
(*D) TOPOLOGY: linear 



(>:i) SEQUEJJCE DESCRIPTION: SZQ ID NO: 40: 
TAGCATTAAC GTCCAATA 

(2) INFORMATION FOR SEQ ID NO: 41; 

(i-, SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:41: 
AAACGACGGC CAGTGCCAAG TGACGCGTGT GA.V^TTGTTA TCC 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 43 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 
GGCGAAAGGG AATTCTGCAA GGCGATTAAG CTTGGGTAAC GCG 

(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GGCGTTACCC AAGCTTTGTA CATGGAGAAA ATAAAG 



43 
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wo 92/06176 



per/ LS9 1/07 1 4 



ICl 



(2) INFORMATION FOR SEQ ID NO : 4^ ; 

(i) SEQUELNCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEr)NF<;S : ,c;inglp 

(D) TOPOLOGY': linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO;^^^: 

t.:;aaacaaag cactattgca ctggcactct taccgttaC;'^ cr 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY': linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
TACTGTTTAC CCCTGTGACA AA.AGCCGCCC AGGTCCAGCT G 



(2) INFORKATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 44 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
TCGAGTCAGG CCTATTGTGC CCAGGGATTG TACTAGTGGA TCCG 



(2) INF0R>1ATI0N FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
TGGCGAAAGG GAATTCGGAT CCACTAGTAC AATCCCTG 



(2) INFORMATION FOR SEQ ID NO: 48: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUEL^CE DESCRIPTION: SEQ ID N3:^8: 
GGCACAATAG GCCTGACTCG AGCaGCTGGA CC.\GGGCCGC TT 

(2) INFOR>L^TION FOR SEQ ID NO: 49: 

(i; SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
.'d') TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:49: 
TTGTCACAGG GGTAAACAGT AACGGTAACG GT^.-\GTGTGC CA 

(2) INFORMATION FOR SEQ ID NO; 50: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 42 base pairs 
('B) TYPE: nucleic acid 
(C) STRA.NDEDNESS : single 
(D^ TOPOLOGY: linear 



(xi; SEQUENCE DESCRIPTION: SEQ ID N0::-0: 
GTGCAATAGT GCTTTGTTTC ACTTTATTTT CTCCATGT\C 

(2) INFORxHATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY': linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
TAACGGTAAG AGTGCCAGTG C 
(52) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAKE/KEY: misc_dif f erence 

(B) LOCATION: replace(25 "") r^r^r^ovc^mr^ an FOUAL 
(T)^ OTHER INFORMATION: /note- "M REPRESe^TS Pa^ t-Quai. 
^ ^ MIXTURE OF A AND C AT THIS IDCATION AND AT 

LOCATIONS 28, 31, 34, 37, 40, 43, 46 & 49 



wo 92/06176 



(xi) SEQUENCE DESCRIPTIOS: SEQ ID NO: 52: 
AGCTCCCGGA TGCCTCAGAA GATCM^^^^-T^^ MNNMN^^-^N^ NNM>r^;H:;N-; NGG 
ChCAGGGG 

[2) INFORM.'MION FOR SEQ ID N'0:53: 

(i; SEQUENCE CHAJIACTERISTI CS : 

(A) LENGTH: 5U base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY; linear 



(ix) FEATURE: 

(A) NAME/KEY: niisc_dif f erence 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

CAGCCTCGGA TCCGCCMN7W .VNMNNM.N^^ >M7NMN>.'MN-; MNNMN^^ATCM G.^ 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LE^JGTH: 27 base pairs 
(3") TYPE: nucleic acid 
CO STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:5u: 
GGTAAACAGT AACGGTAAGA (TTGCCAG 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 
(A} LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGCTTTTGC CACAGGGGT 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:n6: 
.GGGTCATCG CCTTCAGCTC CGGATCCCTC AGAAGTCATA .-ACCCCCCA 

GAC 

(2) INFORMATION FOR SEQ ID NO: 57: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 
C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
TCGCCTTCAG CTCCCGGATG CCTCAGAAGC ATGAACCCCC CATAGGC 
(2) INFORMATION FOR SEQ ID NO: 58: 

a) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 25 base pairs 
rS) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY': linear 

(xl) SEQUENCE DESCRIPTION: SEQ ID N0:5S: 
CAATTTTATC CTAAATCTTA CCAAC 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GCCTTCAGCC TCGGATCCGC C 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 
^ ' (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
CGGATGCCTC AGAAGCGCCN N 
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(2) INFORMATIOr: FOR SEQ ID NO: 61: 

(i) SEQUENCE CKAKACTERISTICS : 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
CGGATGCCTC AGAAGGGCTT TTGCCACAGG 
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I CLAIM: 

1. A coEposition of matter comprising a 



pi 



..uraiity of cells containing a diverse population of 
eiipressible oligonucleotides operationally linlced to 
expression elements, said expressible oligonucleotides 
having a desirable bias of random codon sequences 
produced from random combinations of firsr and second 
Oligonucleotide precursor populations having a desirable 
bias of random codon sequences. 

2. The composition of claim 1, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

£ ^ 1 — ■• vh'"-'rein the 

3. The composition of cia .... . «n^xt.x 

desirable bias of random codon sequences of said first 
and second oligonucleotides is biase:i toward a 
predetermined sequence. 

4 The composition of claim 1, wherein saia 
first and second oligonucleotides having ranaom codon 
sequences have at least one specified codon at a 
predetermined position. 

5. The composition of claim 1, wherein said 
cells are procaryotes. 

6. The composition of claim 1, wherein said 
cells are • goU- 
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7. A kit for the preparation of vectors usefu! 
for the expression of a diverse population of random 
peptides from combined first and second oligonucleotides 
having a desirable bias of randon codon sequences, 
5 comprising: two vectors: a first vector having a cloning 
site for said first oligonucleotides and a pair of 
restriction sites for operationally combining first 
oligonucleotides with second oligonucleotides; and a 
second vector having a cloning site for said second 
10 oligonucleotides and a pair of restriction sites 

complementary to those on said first vector, one or both 
vectors containing expression elements capable of being 
operationally linked to said combined first and second 
oligonucleotides . 

8. ThG kit of claim 7, wherein said vectors 
are in a filamentous bacteriophage. 

9. The kit of claim 8, wherein said 
filamentous bacteriophage are M13 . 

10. The kit of claim 7, wherein said vectors 
are plasmids. 

11. The kit of claim 7, wherein said vectors 
are phagemids. 

12. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second 
oligonucleotides is unbiased. 

13. The kit of claim 7, wherein the desirable 
bias of random codon sequences of said first and second 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 
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14. The kit of Clair; 7, wherein said first and 
second oligonucleotides having a desirable bias of randon 
codon sequences have at least one specified codon at a 
predetermined position. 

15. The kit of claiE 7, wherein said pair of 
restriction sites are Fok I. 

16. A cloning system for expressing random 
peptides from diverse populations of combined first and 
second oligonucleotides having a desirable bias of randoc 
codon sequences, comprising: a set of first vectors 

5 having a diverse population of first oligonucleotides 
having a desirable bias of random codon sequences and a 
set of second vectors having a diverse population of 
second oligonucleotides having a desirable bias of randori 
codon sequences, said first and second vectors each 
10 having a pair of restriction sites so as to allow the 
operational combination of first and second 
oligonucleotides into a contiguous oligonucleotide hav.ng 
a desirable bias of random codon sequences. 

17. The cloning system of claim 16, wherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is unbiased. 

18. The cloning system of claim 16, wherein 
the desirable bias of random codon sequences of said 
first and second oligonucleotides is diverse but biasea 
toward a predetermined sequence. 

19. The cloning system of claim 16, wherein 
said first and second oligonucleotides having a desirable 
bias of random codon sequences have at least one 
specified codon at a predetermined position. 
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20. The cloning systen of clair 16, wherein 
said combined first and second vectors is through a pair 
of restriction sites. 

21. The cloning rysten of clair 16; wherein 
said pair of restriction sites are Fok I. 

22. A conposition of matter comprising a 
plurality of cells containing a diverse population of 
expressible oligonucleotides operationally linker^ to 
expression elements, said expressible oligonucleotides 

5 having a desirable bias of random codon sequences. 

23. The composition of claim 22, wherein said 
cells are procaryotes. 

24. The conposition of claim 22, wherein said 
expressible oligonucleotides are expressed as peptide 
fusion proteins on the surface of a filamentous 
bacteriophage . 

25. The composition of claim 22, wherein said 
filamentous bacteriophage is M13, 

26. The composition of claim 22, wherein said 
fusion protein contains the product of gene VIII. 

27. The composition of claim 22, wherein said 
diverse population of oligonucleotides having a desirable 
bias of random codon sequences are produced from the 
combination of diverse populations of first and second 

5 oligonucleotides having a desirable bias of random codon 
sequences . 
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28. The composition of ciain 22, wherein the 
desirable bias of randoni codon sequences of said 
oligonucleotides is unbiased. 

29. The composition or clain 22, wherein the 
desirable bias of randoa codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 

30. The composition of claim 22, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

31. A plurality of vecror:; containing a 
diverse population of expressible oligonucleotides havm: 
a desirable bias of random codon sequences. 

32. The vectors of claim 31, wherein said 
oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

33. The vectors of claim 31, wherein said 
filamentous bacteriophage is M13 . 

34. The vectors of claim 31, wherein said 
fusion protein contains the product of gene VIII. 

35. The vectors of claim 31, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

36. The vectors of claim 31, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 
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37. The vectors of clai.: 31, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetennined position. 



38. A composition of matter^ comprising a 
diverse population of oligonucleotides having a desirable 
bias of randori codon sequences produced from random 
combinations of two or more oligonucleotide precursor 
5 populations having a desirable bias of random codon 
sequences . 



39, A method of constructing a diverse 
population of vectors having combined first and second 
oligonucleotides having a desirable bias of random codon 
sequences capable of expressing said combined 
5 oligonucleotides as random peptides, comprising the steps 
of: 



(a) operationally linking sequences from a 
diverse population of first 
oligonucleotides having a desirable bias 
10 of random codon sequences to a first 

vector; 



(b) operationally linking sequences from a 
diverse population of second 
oligonucleotides having a desirable bias 
15 of random codon sequences to a second 

vector; and 



(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
20 oligonucleotides are joined together into 

a population of coiabined vectors capable 
of being expressed. 
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40. The method of claim 39, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

41. The method of claim 39, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 

42. The method of claim 39, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

43. The method of claim 33, wherein steps (a) 
through (c) are repeated two or more times. 
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44. A nethod of selecting a peptide capable of 
being bound by a iigand binding protein fron a population 
of randon peptides, compx iisinq ; 

(a) operationally linking a diverse population 
5 of first oligonucleotides having a 

desirable bias of randor: codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 

10 desirable bias of random, coclon sequences 

to a second vector; 

(c) combining the vector products of steps (a; 
and (b) under conditions where said 
populations of first and second 

15 oligonucleotides are joined together into 

a population of coaibined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing said 

20 population of random peptides; and 

(e) determining the peptide which binds to 
said ligand binding protein. 

45. The method of claim 44, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

46. The method of claim 44, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 
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47. The method of clair. 44, wherein said firsc 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codor. 
at a predetei-mined po=5ition. 

48. The method of cl-^in 44, wherein steps (a) 
through (c) are repeated two or more times. 
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49, A nethod for determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding prorein which is selected from a 
population of randon peptides, comprising: 

(a) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random ccdon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; 



(c) combining the vector products of steps (a) 
and (b) under conditions where said 

15 populations of first and second 

oligonucleotides are joined together into 
a population of combined vectors; 

(d) introducing said population of combined 
vectors into a compatible host under 
conditions sufficient for expressing said 
population of random peptides; 



20 



(e) determining the peptide which binds to 
said ligand binding protein; 

(f) isolating the nucleic acid encoding said 
2 5 peptide; and 

(g) sequencing said nucleic acid. 
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50. The method of clain 49, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is unbiased. 

51. The method of claim 49, wherein the 
desirable bias of random codon sequences of said first 
and second oligonucleotides is diverse but biased toward 
a predetermined sequence. 

52. The method of claim 49, wherein said first 
and second oligonucleotides having a desirable bias of 
random codon sequences have at least one specified codon 
at a predetermined position. 

53. The method of claim 49, wherein steps (a) 
through (c) are repeated two or more times. 

54. A method constructing a diverse 
population of vecrors containing expressible 
oligonucleotides having a desirable bias of random codon 
sequences, comprising operationally linking a diverse 

5 population of oligonucleotides having a desirable bias of 
random codon sequences to expression elements. 

55. The method of claim 54, wherein said 
oligonucleotides are expressible as fusion proteins on 
the surface of filamentous bacteriophage. 

56. The method of claim 54, wherein said 
filamentous bacteriophage are M13 , 

57. The method of claim 54, wherein said 
fusion protein contains the product of gene VIII. 
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58. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 

59. The method of claim 54, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence . 

60. The method of claim 54, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

61. The method of claim 54, wherein said 
operationally linking further comprising the steps of: 

(a) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(b) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 



(c) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
-^5 a population of combined vectors. 



62, 



The method of claim 61, wherein steps (a) 



through (c) are repeated two or more times, 



wo 92/06176 



PCT/IJS91/07141 



118 

63. A method of selecting a peptide capable of 
being bound by a binding protein from a population of 
random peptides, comprising: 

(a) operationally linking a diverse population 
5 of oligonucleotides having a desirable 

bias of random codon sequences to 
expression elements ; 

(b) introducing said population of vectors 
into a compatible host under conditions 

10 sufficient for expressing said population 

of random peptides; and 

(c) determining the peptide which binds to 
said ligand binding protein. 

64. The method of claim 63, wherein said 
population of random peptides are expressed as fusion 
proteins on the surface of filamentous bacteriophage. 

65. The method of claim 63; wherein said 
filajnentous bacteriophage are M13 . 

66. The method of claim 63, wherein said 
fusion protein contains the product of gene VIII. 

67. The method of claim 63, wherein the 
desire±>le bias of random codon sequences of said 
oligonucleotides is unbiased. 

68. The method of claim 63, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence. 
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69. The method of clain 63, wherein said 
oligonucleotides having a desirable bias of randon codon 
sequences have au least one specified codon at a 
predetermined position. 

70. The method of claim 63, wherein step (a) 
further comprises: 

(al) operationally linking a diverse populaticr. 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 

(a2) operationally linking a diverse populaticr. 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 

(a3) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
a population of combined vectors. 

71, The method of claim 70, wherein steps (al) 
through (a3) are repeated two or more times. 
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72. A method of determining the nucleic acid 
sequence encoding a peptide capable of being bound by a 
ligand binding procein which is selected from a 
population of random peptides, comprising: 

(a) operationally linking a diverse populatio: 
of oligonucleotides having a desirable 
bias of random codon sequences to 
expression elements. 

(b) introducing said population of vectors 
into a compatible host under conditions 
sufficient for expressing said population 
of random peptides; 

(c) determining the peptide which binds to 
said ligand binding protein; 

^5 (d) isolating the nucleic acid encoding said 

peptide; and 

(e) sequencing said nucleic acid. 

73. The method of claim 72, wherein said 
population of random peptides are expressed as fusion 
proteins on the surface of filamentous bacteriophage. 

74. The method of claim 72, wherein said 
filamentous bacteriophage are M13 . 

75. The method of claim 72, wherein said 
fusion protein contains the product of gene VIII. 

76. The method of claim 72, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is unbiased. 
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77. The method of clair. 72, wherein the 
desirable bias of random codon sequences of said 
oligonucleotides is diverse but biased toward a 
predetermined sequence . 

78. The method of claim 72, wherein said 
oligonucleotides having a desirable bias of random codon 
sequences have at least one specified codon at a 
predetermined position. 

79. The method of claim 72, wherein step (a) 
further comprises: 

(al) operationally linking a diverse population 
of first oligonucleotides having a 
desirable bias of random codon sequences 
to a first vector; 



10 



;a2) operationally linking a diverse population 
of second oligonucleotides having a 
desirable bias of random codon sequences 
to a second vector; and 



{a3) combining the vector products of steps (a) 
and (b) under conditions where said 
populations of first and second 
oligonucleotides are joined together into 
2^5 a population of combined vectors. 

80. The method of claim 78, wherein steps (al] 
through (a3) are repeated two or more times. 

81. A vector comprising two copies of a gene 
encoding a filamentous bacteriophage coat protein, both 
copies encoding substantially the same amino acid 
sequence but having different nucleotide sequences. 
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82. The vector of claim 31, wherein said 
filamentous bacteriophage is M13 . 

83. The vector of claim 81, wherein said gene 
is gene VIII. 

84. The vector of claim 81, wherein said 
vector has substantially the sequence shown in Figure 5 
(SEQ ID NO: 1) . 

85. A vector comprising two copies of a gene 
encoding a filainentous bacteriophage coat protein, one 
copy of said gene capable of being operationally linked 
to an oligonucleotide wherein said oligonucleotide can be 

5 expressed as a fusion protein on the surface of saxd 
filamentous bacteriophage or as a soluble peptide. 

86. The vector of claim 34, wherein said one 
copy of said gene is expressed on r.he surface of said 
filamentous bacteriophage. 

37. The vector of claim 84, wherein said 
bacteriophage coat protein is M13 gene VIII. 
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21 



121 
181 

2m 



ATAGCi AAAC 
CGTTC6CAGA 
GTTGC«TAn 



551 
-.21 

^0 



j'ji 
651 
721 
781 
8^1 
90J 
961 
1021 
1081 
11^1 
1201 
1251 
1321 
1381 
m^il 
1501 
1561 
1621 
1681 

um 

1801 
1861 
1921 
1981 
20^1 
2101 
2161 
2221 
2281 
23iJl 

2m 
2m 

2521 
2581 
26i<l 
2701 
2761 
2821 
2881 
29m 
3001 
3051 
3121 
3181 
32^11 
3301 
3351 
3421 
3481 
3541 
3501 
3551 
3721 
3781 



TCTGCAAAA 
TTG6AGTTTG 
TCTTTCGGGC 
CAGGGTAAAG 
TTTGAG6GGG 
AAACATT7TA 
GG77TTTATC 



AATTCCTTTT 
ATGAATCm 
TCTTCCCAAC 
CAATGATTAA 
CTCGTCAGGG 
AATATCCGGT 
TGTACACCGT 
GTCTGCGCCT 
CAG6CGATGA 
CAAA6ATGAG 
GTGGCATTAC 
CAAAGCCTCT 
CGATCCCGCA 
TGCGTGGGCG 
ATTCACCTCG 
TTTTTGGAGA 
TAHCTCACT 
TTTACTAACG 
CTGTGGAATG 
TG6GTTCCTA 
TCTGAGGGTG 
ATTCCG66CT 
AACCCCGCTA 
CAGAATAATA 
CAAGGCACT6 
TATGACGCTT 
GATCCATTCG 
GCTGGCGGCG 
GGCGGTTCTG 
GATTTT6ATT 
GAAAAC6CGC 
6CT6CTATCG 
GGTGATTTTG 
TTAATGAATA 
TTTGTCnTA 
TTCCGTGGTG 
TTT6CTAACA 
TAHATTGCG 
TTAAAAAGGG 
6GCTTAACTC 
TTGTTCAGGG 
TCTCTGTAAA 
ATTGG6ATAA 
CTCGHAGCG 
CCTGATTTAA 
CTTAGAATAC 
TCCTACGATG 
ACCCGTTCTT 
AAATTAGGAT 
CGTTCTGCAT 
"TTGTCGG7A 
G'TGGCGT^G 
;CTGG7iAG.^ 



AGG77A7 i GA 
A7TG6GAA7C 
TAAAACA7G7 
7GACC7C77A 
C7TCCGG7C7 
77CC7C77A> 
ACC7GAT777 
AT7CAA7GAA 
CTA77ACCCC 
GTCG7CTGG' 
GGCG77A75T 
CTACC767AA 
G7CC7GACTG 
AGTTGAAA77 
CAAGCC7TA7 
TC7TGTCAAG 
TCATCTGTCC 
CGT7CCGGCT 
TACAAA7C7C 
TGT7T7AG7G 
G7A7777AC: 
G7AGCCG77G 
AAAGCGGCC7 
A7GG77G77G 
AAAGCAAGC7 
T7T7CAACG7 
CCGCTGAAAC 
7C7GGAAAGA 
C7ACAGGCG7 
TTGGGC77GC 
GCGG77C7GA 
ATACT7A7A7 
ATCCTAA7CC 
GGTTCCGAAA 
ACCCCGT7AA 
ACTGGAACGG 
TTTGTGAATA 
GC7C7GGTGG 
AGGGTGGCGG 
ATGAAAAGA7 
TACAGTCTGA 
ATGGn7CA7 
CT6GCTC7AA 
ATTTCCGTCA 
GCGC7GGTAA 
TCTTTGCGn 
TACTGCGTAA 
TTTCCTCGGT 
CTTCGGTAAG 
AAnCT7GTG 
TGTTCAGTTA 
GGCTGCTATT 
ATAATATG6C 
TTGGTAAGAT 
GGCnCAAAA 
CGGATAA6CC 
AAAATAAAAA 
GGAATGATAA 
GGGATATTA7 
TAGC7GAACA 
CTTTATAT7C 
T7AAA7ATGG 
A777G7;7AA 



- - • ^ J", i V 
CC.-i I 1 7GCG" 
AAC7Gn"ACA 
7GAGC7ACAG 
' CAAAAGbAb 
GG77CGC777 
7C7777T6A7 
TGA7T7A7GG 
7A77TAT6AC 
C7C7GGCAAA 
AAACGAGG67 
^JuGCAJJA 
7AA7G7 I G77 
G7ATAA7GAG 
AAACCA7CTC 
7CAC7GAA7G 
AT7ACTCT7G 
7C7T7CAAAG 
AAGTAACA7G 
CG77G7AC7T 
TA77C77TCG 
CG777AATGG 
■:7ACCC7CG7 
77AAC7CCCT 
7CA77G7CGG 
GA7AAACCGA 
GAAAAAA77A 
7G77GAAAG7 
C6ACAAAAC7 
7GTAG7T7G7 
7A7CCCTGAA 
GGG7GGCGG7 
CAACCCTCTC 
7TC7C77GAG 
7AG6CAGGGG 
AACT7AT7AC 
7AAAT7CAGA 
7CAAGGCCAA 
7GGTTCTG67 
CTCTGAGGGA 
GGCAAACGC7 
CGCTAAAGGC 
76GTGACGTT 
7TCCCAAATG 
ATATTTACC7 
ACCA7ATGAA 
7CTTTTATAT 
7AAGGAGTCT 
TTCCTTCTGG 
ATAGCTATTG 
GGTTATCTCT 
ATTCTCCCGT 
TTCATTTTTG 
TGTTTATTTT 
TCAGGATAAA 
CCTCCCGCAA 
TTCTATATCT 
CGGC7TGCTT 
GGAAAGACAG 
CTTCCTTGn 
TCTTGTTTA7 
7C77A77AC7 
CGA77C7CAA 
CGCi7A7GA7 



I G7A7C 1 



7GGAATGhhi 
-iCCAGA77l 
CvAi JAAAG-j 
■onAGC7CGA A 

7CA^tC7CG7 
GA77CCGCA5 
ACi7Cn77G 
7A7GA7A67G 
G77GAA7G7G 
CCGT7AG7TC 
CCAG77C77A 
AAGCCCAA77 
AGCA6C77TG 
A7GAAGG7CA 
7'GG7CAG77 
GAGCAG67CG 
7G777CGCG: 
CCiCmCG: 

AmMC I TCC I U 

7CCGA7GC7G 
GCAAGCCTCA 
C6CAAC7A7C 
TACAA77AAA 
77A7TCGCAA 
7GT77AGCAA 
77AGA7CGT7 
AC7GG7GACG 
AATGAGGG7G 
ACTAAACCTC 
GACGGCA':T7 
GAGTC7CAGC 
GCA7TAACTG 
CAGTACACTC 
GAC7GCGCT7 
7CG7C7GACC 
GGCGGCTC7G 
GGCGGTTCCG 
AATAA6GGGG 
AAACTTGAT7 
TCC6GCCTTG 
GCTCAAC-TCG 
TCCCTCCCTC 
TTTTC7ATTG 
GnGCCACCT 
TAATCATGCC 
TAACTnGT7 
CTATnCAT7 
CTGATATTAG 
CTAAT6C6C7 
ACGTTAAACA 
GTAACTGGCA 
AHGTAGCTG 
6TCGGGAGG7 
GAHTGCTTG 
GTTCTCGATG 
CCGATTAHG 
CAGGACTTA7 
7G7CG7CGTC 
G6C;CGAAAA 



iGG7CAAAl 
77CCAGACA 
6CAATTAAG 
ACrCTCTAi 
i AAAACGCG 
C7GA 
C7GAAC7 
'-77GGAC6C 

:;aaagcct: 

"3C7C77AC 
G"A77CC7AA 
:-77T7AT7AA 
AAA7CGCA7A 
>.CTACTCG7 
"ACGT7GA7 

g:cagccta7 

CGGT7CCCn 
:3GA777CGA 
"GG7A7AA" 
"7AGG77GG 



GAAAAAGT 
■^C ; ''7CGC7G 
■?:GACCGAA7 
GGTATCAAGC 
GGC7CC7T77 
■7CCT7TAG7 
AACCCCA7AC 
ACGCTAAC7A 
AAACTCAGTG 
G7GGC7CTGA 
C7GAG7ACGG 
A7CCGCC7GG 
C7C7TAATAC 
n7ATACGGG 
CTGTATCATC 
7CCATTCTGG 
7CGGTCAACC 
AGGGTGGTGG 
GTGG7GGCTC 
C7ATGACCGA 
C7GTCGCTAC 
C7AATGGTAA 
GTGACGGTGA 
AATCGGTTGA 
AT7GTGACAA 
7TATGTATGT 
AG7TCTTn6 
GCCGTATCTG 
GTTTCTTGC7 
CGCTCAATTA 
7CCCTGTnT 
AAAAATCGTT 
AAHAGGCTC 
GGTGCAAAAT 
TCGCTAAAAC 
CTATTGGGCG 
AGTGCG6TAC 
AT7GGTTTCT 
C7A77GTTGA 

7g:-;ca6aa7 



7AAATC7AC7 
CCGTACTHA 
CTCTAAGCCA 
TCCTGACCiG 
A7AmGAAG 
C7ATAATAG7 
G7TTAAA6CA 
7ATCCAG7CT 
7CGCTATT77 
7ATGCC7CG7 
A7CTCAAC7G 
CG7AGA77T7 
AGGTAAT7CA 
7CTGGTG7TT 
7TGGG7AA7G 
GCGCCTGG7C 
ATGAHGACC 
CACAAnTA7 
CGCTGGGGG7 
7GCC7TCG7; 
C77TAG7CC7 
C7GAGGG7G; 
A7ATCGG77A 
7GT7TAAGAA 
GGAGCC7Tn 
GTTCCinC 



TTAAGCCCTA "G—GAGCG 



AC7AAACAGC- 



AGAAAATTC/! 

76AGGGTTGT 

TTACGGTACA 

GGGTG6CGGT 

7GATACACCT 

7ACTGAGCAA 

TTTCATGin 

CACTGnAC7 

AAAAGCCATG 

CHTAATGAA 

7CCTGTCAA7 

CTCTGAGGG7 

TGGTTCC6G7 

AAATGCCGAT 

TGAHACGGT 

7G6T6CTACT 

7AATTCACCT 

ATGTCGCCCT 

AATAAACTTA 

ATTTTCTACG 

GGTAnCCG7 

CTTACTTTTC 

CTTAHATTG 

CCCTCTGACT 

TATGHAnC 

TCTTAniGG 

TGGAAAGACG 

AGCAACTAA7 

GCCTCGCGTT 

CGGTAATGAT 

nGGTITAAT 

ACATGCTCG7 

TAAACAGGCG 

7.;CT7TACC7 
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SUBSTITUTE SHEET 
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I. 



61 ATAGCTAAAC AGGTTATTGr 



2iil 
301 

561 TCTTTCGGGC 

'All 

m TTTGAGGGGG ATTCAATGA,^ 
541 AAACATTTTA CTATTACCCC 
501 GGTTTTTATC GTCGTCTGGT 
551 AATTCCTTTT GGCGTTATGT 
721 ATGAATCTTT CTACCTGTAA 



TTGATGC'^ '"■^k.^. xir r-i.''^-;'^"'"'"'^ AAATGAAAAT bG 
'.TTTGGGA A.-. GTATCTA ATGGiGAAAG TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACT6TTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTTGCATATT TAAAACATGT TGAGCTACA6 CACCAGATTC AGCAATTAAG CTCTAAGCCA 240 
TfTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAA6G TACTCTCTAA TCCTGACCTG 300 
"TGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATTTGAAG 350 
TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATAGT 420 
ACCTGATTTT TGATTTATGG TCATTCTCGT TTTCT6AACT GHTAAAGCA 480 
TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT S^iQ 
CTCTGGCAAA ACITCTTTTG CAAAAGCCTC TCGCTATin 600 
AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 550 
ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 

__ _ TAATGTTGTT CCGTTAGTTC 6TTTTATTAA CGTAGATTTT 780 

781 TCTTCCCAAC 6TCCTGACTG GTATAAT6AG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTC6T TCTG6TGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGG6TAATG 950 
961 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA 6CCA6CCTAT GCGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT C6GTTCCCTT ATGATT6ACC \m 
1081 GTCT6CGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATTTCGA CACAATTTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTT7AGTG TATTCTTTCG CCTCTTTCGT TTTA6GTTGG TGCCTTCGTA 1260 
1251 6TGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCC6TTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT 'TAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA 1440 
1441 TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 A7TCACCTCG AAAGCAAGCT GATAAACC6A TACAATTAAA GGCTCCTTTT GGAGCCTTTT 1560 
1561 TTTTTG6AGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCmC 1620 
1621 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1580 
1681 TTTACTAACG TCTGGAAA6A CGACAAAACT TTAGATCGTT ACGCTAACTA T6AGGGTTGT 1740 
1741 CTGTGGAATG CTACAGGCGT 7GTAGTTTGT ACTGGT6ACG AAACTCAGTG HACGGTACA 1800 
1801 TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGT6 GTGGCTCT6A GGGTGGCGGT 1860 
1851 TCTGAGGGTG GCGGHCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 AnCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGm 2040 
2041 CAGAATAATA GGTTCCGAAA TAGGCAGGGG GCATTAACTG THATACGGG CACTGnACT 2100 
2101 CAAGGCACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2151 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCATTCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC T6CCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGCG GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTG6TGG CTCTGAGGGT 2340 
23ill GGC66TTCTG AGGGTGGCGG CTCTGAGGGA GGCGGHCCG GTGGTGGCTC IGGHCCGGT im 
2i]01 GATTTTGATT t\JGmmJ GGCAAACGCT AATAAGGGG6 CTATGACCGA AAATGCCGAT 2460 
2451 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACTTGATT CT6TCGCTAC TGAHACGGT 2520 
2521 GCTGCTATC6 ATG6TTTCAT TGGTGACGH TCCGGCCTT6 CTAATGGTAA TGGTGCTACT 2580 
2581 GGIGATHTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACGGTGA TAAHCACCT 2640 
2641 nAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTTGA ATGTCGCCCT 2700 
2701 TTTGTCTTTA GCGCT6GTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACHA 2750 
2751 TTCCGTGGT6 TCTTTGCGTT TCTTTTATAT 6TTGCCACCT TTATGTATGT ATTnCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGTTCTinG GGTAHCCGT 2880 
2881 TATTATTGCG TTTCCTCGGT TTCCTTCT66 TAACTnGTT CGGCTATCTG CTTACTTTTC 2940 
2941 TTAAAAAGG6 CTTCGGTAAG ATAGCTATTG CTATTTCATT GTTTCnGCT CnAHAHG 3000 
3001 GGCTTAACTC AAHCTTGIG GGnATCTCT CTGATATTAG CGCTCAATTA CCCTCTGACT 3050 
3051 TTGTTCAG6G TGTTCA6TTA AnCTCCCGT GTAATGCGCT TCCCTGTTTT TATGTTAnC 3120 
3121 TCTCTGTAAA GGCTGCTATT TTCATTTTT6 ACGHAAACA AAAAATCGTT TCnATTTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTATTn 6TAACTGGCA AAHAGGCTC TGGAAAGACG 3240 
3241 CTCGTTAGCG nGGTAAGAT TTAGGATAAA AHGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGAmAA GGCTTCAAAA CCTCCC6CAA 6TCGGGAGGT TCGCTAAAAC GCCTCGCGH 3360 
3361 CTTAGAATAC C6GATAAGCC TTCTATATCT GATTTGCTTG CTATTGGGCG CGGTAATGAT 3420 
3^2l TCCTACGATG AAAATAAAAA CGGCTTGCn GTTCTCGATG AGTGC6GTAC TTGGTTTAAT 3480 
3481 ACCCGHCn GGAATGATAA GGAAAGACAG CCGATTATT6 AnGGTHCT ACATGCTCGT 3540 
3541 AAAHAGGAT GGGATATTAT CTTCCTTGTT CAGGACHAT CTAnGHGA TAAACAGGCG 3600 
3501 CGTTCTGCAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3560 



3561 TTTGTCGGTA 
3721 GTTGGCGTTG 
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1 AATGCTACTA CTATTAGTAG AATTGATGCC nCCTTHCAG CTCGCGCCCC AAATGAAAAi 60 
51 ATAGCTAAAC AGGTTAHGA CCATHGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGG6AATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACTTTA 180 
181 GTT6CATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAA6CCA 2^0 
2^1 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCT6 300 
501 TTG6AGTTTG CTTCCGGTCT GGTTCGCm GAAGCTC6AA TTAAAAC6CG ATATTTGAAS 360 
35 TCTTTCGG6C TTCCTCTTAA TCTTTTTGAT GCAATCCGCT TT6CTTCTGA CTATAATAGT 420 
^21 CA6GGTAAA6 ACCTGATTTT TGATTTATG6 TCATTCTCGT TTTCTGAACT GTTTAAAGCA H80 
m TTTGA6GGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 5W 
5? AAACATTTTA CTATTACCCC CTCTGGCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 500 
501 GGITTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 650 
561 AATTCCTTTT 6GCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTS 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGTTAGTTC GTTTTATTAA CGTAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 840 
841 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTC6T TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT JTGGGTAATG 9dO 
951 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTl 020 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAGGTCG CGGATiTCGA CACAATTTAj 1140 
1141 CA6GCGATGA TACAAATCTC CGTTGTACTT TGTTTCGCGC TTGGTATAAT CGCTGGGGGi 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TbCCTTCGTA 1260 
1251 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGi CTTTAGTCu 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAi ATATCGGTTA 440 
1441 TGCGTGGGC6 ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCA CCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCTTH GGAGCCTTTT 1560 
1551 TTTTTGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1620 
1521 TAHCTCACT CCGCTGAAAC TGHGAAAGT TGTTTAGCAA AACCCCATAC AGAAAATTCA 1680 
581 TTTACTAACG TCTGGAAAGA CGACAAAACT HAGATCGIT ACGCTAACTA TGAGG5TTGT 1740 
74 CT6TGGAATG CTACAGGCGT TGTA6TTTGT ACTG6TGACG AAACTCA6TG JTACGGTACA 1800 
1801 TGGGnCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA GGGTGGCG6T 1850 
1861 TCTGA6GGTG GCGGTTCTGA GGGT6GCGGT ACTAAACCTC CTGA6TACGG TGATACACCT 1920 
921 ATTCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACH ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCTTGAG GAGTCTCAGC CTCTTAATAC TTTCATGTTT 2040 
2041 CA6AATAATA GGHCCGAAA TAGGCA666G 6CATTAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAG6CACTG ACCCC6TTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2151 TATGACGCTT ACTGGAACGG TAAATTCAGA GACTGCGCTT TCCATTCTGG CTTTAATGAA 2220 
2221 GATCCAHCG TTT6TGAATA TCAAGGCCAA TCGTCT6ACC T6CCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCG6C6 GCTCTGGTGG TGGnCTGGT GGCGGCTCT6 AGGGTGGT6G CTCTGAGGGT 2340 
2341 GGCGGHCTG AGGGTGGC6G CTCTGAGG6A GGCGGTTCCG GTGGTGGCTC TG6TTCCGGT 2400 
2401 GATTTTGAn ATGAAAA6AT 6GCAAACGCT AATAAGGGGG CTATGACCGA AAAT6CCGAT 2460 
2451 GAAAACGCGC TACA6TCTGA CGCTAAA6GC AAACHGAH CTGTCGCTAC T6ATTACGGT 2520 
2521 GCT6CTATCG ATGGTTTCAT TGGTGACGTT TCCGGCCTTG CTAAT6GTAA TGGT6CTACT 2580 
2581 GGTGAnnG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG 6TGACGGTGA TAATTCACCT 2540 
2641 TTAAT6AATA ATHCCGTCA ATATHACCT TCCCTCCCTC AATCG6TTGA AT6TC6CCCT 2700 
2701 TTTGTCnTA GCGCTGGTAA ACCATATGAA TTTTCTATTG ATTGTGACAA AATAAACTTA 2760 
2761 TTCCGTG6TG TCHTGCGn TCTTTTATAT GHGCCACCT TTATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCAT6CC AGTTCTTTTG G6TATTCCGT 2880 
2881 TAHAHGCG THCCTCGGT TTCCnCTGG TAACTHGn C6GCTATCTG CnACTTTTC 2940 
2941 HAAAAAGGG CTTCG6TAAG ATAGCTAHG CTATTTCAn GTTTCTTGCT CTTATTATT6 3000 
3001 GGCTTAACTC AATTCTTGTG GGHATCTCT CT6ATATTAG C6CTCAATTA CCCTCTGACT 3060 
3051 TTGHCAGGG TGITCAGHA AHC TCCCG T CTAATGCGCT TCCCTGTTTT TAT6TTATTC 3120 
3121 TCTCTGTAAA GGCTGCTAH nCATTTTTG ACGTTAAACA AAAAATCGH TCTTATTTG6 3180 
3181 ATTGGGATAA ATAATATGGC TGTTTAnTT GTAACTGGCA AATTA6GCTC TGGAAA6ACG 3240 
3241 CTCGHAGCG TTGGTAAGAT TCAGGATAAA AHGTAGCTG 6GTGCAAAAT AGCAACTAAT 3300 
3301 CnGATTTAA GGCHCAAAA CCTCCCGCAA GTC6G6AG6T TCGCTAAAAC GCCTC6CGTT 5360 
3351 CHAGAATAC C6GATAA6CC TTCTATATCT GAniGCTTG CTATTGGGC6 CGGTAAT6AT 3420 
3421 TCCTAC6AT6 AAAATAAAAA CGGCHGCn GHCTCGATG AGTGCGGTAC TTGGTTTAAT 3480 
3481 ACCCGHCn G6AATGATAA GGAAA6ACAG CCGAHAHG AnGGTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATAHAT nTTCnGn CAGGACHAT CTAHGnGA TAAACA6GC6 3600 
360 CGTTCT6CAT TAGCT6AACA TGITGHTAT TGTCGTCGTC TGGACA6AAT TACmACCT 3660 
3661 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG TTAAATATGG CGATTCTCAA TTAAGCCCTA CTGTTGAGCG nGGCTTTAT 3780 
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ic 



1 AATGCTACTA CTATTAC-TAG AATT6ATGCC 



121 CGTTCGCAGA ATTGG6AATC AACTGTTACA TG6AATGAAA CTTCCA6ACA CCGTACTTTA 180 

8 GTTGCATATT TAAAACATGT TGAGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2J0 

2S1 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAAHAAAGG iACTCTCTAA JCCTGACCTG 300 

301 TTGGAGTTTG CTTCCGGTCT GGTTCGCTTT GAAGCTCGAA HAAAACGCG ATATTTGAAG 360 

361 TCTTTCGGGC TTCCTCHAA ICTIHTGAT GCAATCCGCT TTGCTTCT6A CTATAATAGT f;20 

421 

601 



ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTAiCTA 



50 60 
. --CGCCCC AAATGAAAAT 60 
iGGTCAAAC TAAATCTACT 120 



CAGGGTAAAG ACCTGATTTT TGATTTATGl- TCATTCTCGT TTTCTGAACT GTTTAAAGCA m 
TTTGAGGGGG ATTCAATGAA TATTTAT6AC GATTCCGCAG JATTGGACGC TATCCAGTCT 5^0 
AAACATTTTA CTAHACCCC CTCTGGCAAA ACHCTTTTG CAAAAGCCTC TCGCTATTTT 600 
oui GGTTTTTATC GTCGTCTGG ' AAACGAGGGT TATGATAGTG TTGCTCTTAC JATGCCTCGT 660 
561 AATTCCTTTT GGCGTTATGT ATCTGCATTA GTTGAATGTG GTATTCCTAA ATCTCAACTG 720 
721 AT6AATCTTT CTACCTGTAA TAATGTTGTT CC6TTAGTTC GTTTTATTAA CGTAGAfrTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA ^AATCGCATA AGGTAATTCA 840 
8ill CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACoTTGAT TTG6GTAATG 960 
951 AATATCCGGT TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GlGCCTGGTC 1020 
1021 TGTACACCGT TCATCTGTCC TCTHCAAAG TTGGTCAGTT CGGnCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GAGCAG6TCG CGGATTTCGA CACAATTTAT \m 
IIU CAGGCGATGA TACAAATCTC CGTTGTACn TGTTTCGCGC TTG6TATAAT CGCTGGG6GT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT [TTAPGTTGG JGCCTTCGTA 1260 
1261 GTGGCATTAC GTATTTTACC CGTTTAATGG AAACHCCTC ATGAAAAAGT CTTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCCGATGCT6 TCTTTCGCTG CTGAGGGT6A 380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGTTA \m 
\m TGCGTGGGCG ATGGTTGTTG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA G6CTCCTTTT GGAGCCTTTT 550 
1561 mTTGGAGA HTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGTTCCTTTC 1520 
1521 TATTCTCACT CCGCT6AAAC TGTTGAAAGT TGrTTAGCAA AACCCCATAC AGAAAATTCA 1680 
1581 TTTACTAACG TCTGGAAAGA CGACAAAACT TTAGATCGTT ACGCTAACTA TGAGGGTTGT 17^0 
17?1 CTGTGGAATG CTACAGGCGT TGTA6TTTGT ACTGGTGACG AAACTCAGT6 TTACGGTACA 1800 
1801 TGGGTTCCTA TTGGGCTTGC TATCCCTGAA AATGAGGGTG GTGGCTCTGA 6GGTGGCGGT 1860 
1851 TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 ATTCC6GGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGA6CAA 1980 
1981 AACCCCGCTA ATCCTAATCC nCTCHGAG GAGTCTCA6C CTCTTAATAC TTTCATGTTT 20J0 
2M1 CAGAATAATA GGTTCCGAAA TA6GCAGGGG GCATTAACTG TTTATACGGG CACTGTTACT 2100 
2101 CAAGGCACTG ACCCCGHAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCn ACTGGAAC6G TAAAHCAGA GACTGCGCH TCCATTCTG6 CTTTAATGAA 2220 
222 GATCCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCT6TCAAT 2280 
2281 GCTGGCGGCG CGTCTGGTGG TGGnCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 23^0 
23S1 G6CGGTTCTG AGGGTGGCGG CTCTGAGGGA GGCGGHCCG GT6GTG6CTC TGGTTCCGGT 2^00 
2401 GATTTTGAH ATGAAAAGAT GGCAAACGCT AATAAGGG6G CTATGACC6A AAATGCC6AT 2^60 
2^461 GAAAACGCGC TACAGTCTGA CGCTAAAGGC AAACHGATT CTGTCGCTAC TGATTACGGT 2520 
2521 GCTGCTATCG AIGGHTCAT TGGTGACGH TCCGGCCTTG CTAATGGTAA TGGT6CTACT 2580 
2581 GGT6ATTTTG CTGGCTCTAA TTCCCAAATG GCTCAAGTCG GTGACG6T6A TAATTCACCT 2M0 
llkl llklVmU AHTCCGTCA ATArTTACCT TCCCTCCCTC AATC6GTT6A ATGTC6CCCT 2700 
2701 TTTGTCTnA GCGCTGGTAA ACCATATGAA TTTTCTAnG ATT6TGACAA AATAAACTTA 2760 
2761 TTCC6TGGTG TCTTTGCGn TCHTTATAT GHGCCACCT nATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGnCTTTTG GGTATTCCGT 2880 
2881 TATTAnGCG TTTCCTCGGT TTCCnCTGG TAACTTTGTT C6GCTATCTG CTTACTTTTC 29J0 
29iil TTAAAAAGGG CHCGGTAAG ATAGCTAHG CTAlTTCAn GTTTCTTGCT CTTATTATTG 3000 
3001 GGCTTAACTC AAHCHGIG GGHATCTCT CTGATAHAG CGCTCAAHA CCCTCT6ACT 3060 
3061 TTGTTCAGGG TGTTCAGTTA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGTTATTC 3120 
3121 TCTCTGTAAA GGCTGCTAH nCATmTG ACGHAAACA AAAAATC6TT TCTTATTTGG 3180 
3181 ATTG6GATAA ATAATATGGC TGrnAHTT GTAACTG6CA AATTAGGCTC TGGAAAGAC6 32J0 
32A1 CTCGHAGCG HGGTAAGAT HAGGATAAA ATTGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATTTAA GGCHCAAAA CCTCCCGCAA GTC66GAGGT TC6CTAAAAC 6CCTCGC6TT 3360 
3361 CTTAGAATAC CG6ATAAGCC TTCTATATCT GATTTGCTT6 CTATTGG6CG CGGTAATGAT 3J20 
3421 TCCTAC6ATG AAAATAAAAA CGGCnGCTT GTTCTCGATG AGT6CGGTAC TTGGTTTAAT 3480 
3481 ACCCGHCn GGAATGATAA GGAAAGACAG CCGATTAnG AnGGTTTCT ACATGCTC6T 3540 
354 AAATTAGGAT GGGATATTAT TTncnGn CAGGACHAT CTAnGHGA TAAACA6GC6 3600 
3501 CGTTCT6CAT TAGCTGAACA TGTTGTTTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3561 TTTGTCGGTA CTTTATATTC TCTTATTACT GGCTCGAAAA TGCCTCTGCC TAAAHACAT 3720 
372 GTTGGCGTTG TTAAATATGG CGATTCTCAA HAAGCCCTA CTGHGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAAGi ATTT6TATAA CGCATATGAT ACTAAACAG6 CTTTHCTAG TAATTATGAT 3840 
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I aatgctact; ctattagtag a-ti'gatgco -:c":tc;g c-ogcgcccc a-.tgaaaat 6d 

61 ATAGCTAAAC AGGTTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACHTA 180 
181 6TTGCATATT TAAAACATGT TGAGCTACAG CACCAGAm AGCAATTAAG CTCTAAGCCA 2^10 
2i|l TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAGG TACTCTCTAA TCCTGACCTG 300 
301 TTGGAGTTTG CTTCC66TCT GGTTCGCTTT GAAGCTCGAA TTAAAACGCG ATATHGAAG 350 
361 TCTTTCGGGC TTCCTCnAA TCTTTTTGAT GCAATCCGCT TTGCTTCTGA CTATAATA6T ^^20 
^21 CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTC6T TTTCTGAACT GTTTAAAGCA liSO 
m TTTGAGG6GG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 5^0 
5^1 AAACATTTTA CTATTACCCC CTCTG6CAAA ACTTCTTTTG C^mCCTC TCGCTATTTT 6C0 
501 G6TTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGTG TTGCTCTTAC TATGCCTCGT 650 
661 AATTCCTTTT GGCGTTATGT ATCTGCATTA GHGAATCTG GTATTCCTAA ATCTCAACTG 720 
721 ATGAATCTTT CTACCTGTAA TAATGTTGTT CCGHAGTIC GniTATTAA CGTAGATTTT 7SC 
781 TCnCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAATTCA 8^0 
Sm CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGT6TTT 900 
901 CTCGTCAGG6 CAAGCCTTAT TCACTGAATG AGCAGCTTTG TTACGTTGAT TTGGGTAATG 960 
951 AATATCC66T TCTTGTCAAG ATTACTCTTG ATGAAGGTCA GCCAGCCTAT GCGCCTGGTC 1C20 
1021 TGTACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGATTGACC 1080 
1081 GTCTGCGCCT CGnCCGGCT AA6TAACATG GAGCAGGTCG CGGAHTCGA CACAAHTAT 1140 
1141 CAGGCGATGA TACAAATCTC CGTTGTACn TGTTTCGCGC TTGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTCG CCTCTTTCGT TTTAGGTTGG TGCCHCGTA 1250 
1261 6T6GCATTAC GTATHTACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CFTTAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGHG CTACCCTCGT TCCGATGCTG TCTTTCGCTG CTGAGGGTGA 1380 
1381 C6ATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA 1440 
1441 TGCGTGGGCG ATGGTTGHG TCATTGTCGG CGCAACTATC GGTATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTC6 AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCnTT GGA6CCTTTT 1560 
1551 irrnGGAGA TTTTCAACGT GAAAAAATTA TTATTCGCAA TTCCTTTAGT TGHCCTnC 1620 
1521 TATTCTCACT CCGCTGAAAC TGTTGAAAGT TGTTTAGCAA AACCCCATAC AGAAAAHCA 1680 
1581 TTTACTAACG TCTGGAAAGA CGACAAAACT HAGATCGIT ACGCTAACTA TGAGGGHGT 1740 
1741 CTGTGGAATG CTACAGGC6T TGTAGTTTGT ACTGGTGACG AAACTCAGTG HACGGTACA 1800 
1801 TGGGnCCTA TT6GGCTTGC TATCCCT6AA AATGAGGGTG GTGGCTCTGA GGGTGGCGGT I860 
1851 TCTGAGGGTG GCGGTTCTGA GGGTGGCGGT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 AHCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC TTCTCHGAG GA6TCTCAGC CTCTTAATAC TTTCATGTTT 2040 
2041 CAGAATAATA GGHCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGnACT 2100 
2101 CAAGGCACTG ACCCC6TTAA AACHATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCTT ACTGGAAC6G TAAATTCAGA GACTGCGCn TCCAnCTGG CTTTAATGAA 2220 
2221 GATCCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 GCTGGCGGC6 GCTCTGGTGG TGGTTCTGGT GGCGGCTCTG AGGGTGGTGG CTCTGAGGGT 2340 
2341 GGCGGTTCTG A66GTGGCGG CTCTGAGGGA GGCGGHCCG GTGDTGGCTC IGGHCCGGT 2400 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGG66G CTATGACCGA AAATGCCGAT 2450 
2451 GAAAACGC6C TACAGTCT6A CGCTAAAG6C AAACTTGAH CTGTCGCTAC TGAHACGGT 2520 
2521 GCTGCTATCG ATGGinCAT TGGTGACGTT TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGAnTTG CTGGCTCTAA HCCCAAATG GCTCAAGTCG GTGACGGTGA TAAnCACCT 26A0 
2641 HAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGTT6A AT6TC6CCCT 2700 
2701 TTTGTCTTTA GCGCTGGTAA ACCATATGAA TnTCTATTG AHGTGACAA AATAAACHA 2760 
2761 nCCGTGGTG TCHTGCGn TCTTTTATAT GnGCCACCT nATGTATGT ATTTTCTACG 2820 
2821 rriGCTAACA TACTGCGTAA TAAGGAGTCT TAATCATGCC AGHCnnG GGTAHCCGT 2880 
2881 TAHAHGCG TTTCCTCGGT HCCnCTGG TAACTHGH CGGCTATCTG CTTACTTTTC 2940 
2941 HAAAAAGGG CHCGGTAAG ATAGCTAnG CTATTTCAn GTTTCnGCT CnAHAHG 3000 
3001 GGCHAACTC AATTCnOTG GGHATCTCT CTGATATTAG CGCTCAAHA CCCTCTGACT 3050 
3061 HGHCAGGG TGnCAGHA AnCTCCCGT CTAATGCGCT TCCCTGTTTT TATGHAnC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCAinTTG ACGHAAACA AAAAATCGH TCnATrTGG 3180 
3181 ATTGGGATAA ATAATATGGC TGTHATTn GTAACTG6CA AATTAGGCTC TG6AAAGACG 3240 
32^11 CTCGHAGCG HGGTAAGAT TCAGGATAAA AHGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CTTGATHAA GGCHCAAAA CCTCCCGCAA GTCG6GAGGT TCGCTAAAAC GCCTC6CGTT 3350 
3361 CHAGAATAC CG6ATAAGCC nCTATATCT GATTTGCnG CTAnGGGCG CGGTAATGAT 3420 
3421 TCCTACGAT6 AAAATAAAAA CGGCHGCn GHCTCGATG AGTGCGGTAC nGGriTAAT 3480 
3481 ACCCGHCTT GGAATGATAA GGAAAGACAG CCGAHAHG ATT6GTTTCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATAHAT CTTCCTTGTT CAGGACTTAT CTATTGTTGA TAAACAG6CG 3600 
3601 C6TTCTGCAT TAGCT6AACA TGTTGITTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3551 TTTGTCGGTA CTTTATATTC TCHAHACT GGCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GTTGGCGTTG TTAAATATGG CGATTCTCAA HAAGCCCTA CT6TTGAGCG nGGCinAT 3780 
3781 ACTGGTAAGA ATTTGTATAA CGCATATGAT ACTAAACAGG CTTTTTCTAG TAATTATGAT 3S4C 
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1 AATGCTACTA CTATTAGTAG AATTGAT6CC ACCTTTTCAG CTCGCGCCCC AAAT6AAAAT 60 
61 ATAGCTAAAC AG6TTATTGA CCATTTGCGA AATGTATCTA ATGGTCAAAC TAAATCTACT 120 
121 CGTTCGCAGA ATTGGGAATC AACTGTTACA TGGAATGAAA CTTCCAGACA CCGTACITTA 180 
181 6TTGCATATT TAAAACATGT T6AGCTACAG CACCAGATTC AGCAATTAAG CTCTAAGCCA 2^10 
2^1 TCTGCAAAAA TGACCTCTTA TCAAAAGGAG CAATTAAAG6 TACTCTCTAA TCCTGACCTb 30C 
301 nCGAGTHG CTTCCGGTCT GGTTC6CTTT GAA6CTC6AA HAAAACGCG ATA1TT6AAG 350 
361 TCTTTCGG6C TTCCTCTTAA TCTTTTTG^t GCAATCCGCT TTCGTTCTGA CTATAATAGT H2Q 
m CAGGGTAAAG ACCTGATTTT TGATTTATGG TCATTCTC6T HTCTGAACT GTHAAAGCA ^80 
^81 TTTGAGGGGG ATTCAATGAA TATTTATGAC GATTCCGCAG TATTGGACGC TATCCAGTCT 5^0 
5^1 AAACATTTTA CTATTACCCC CTCT6GCAAA ACTTCTTTTG CAAAAGCCTC TCGCTATTTT 600 
501 GGTTTTTATC GTCGTCTGGT AAACGAGGGT TATGATAGT6 TTGCTCTTAC TATGCCTCGT 660 
661 AATTCCrnT GGCGTTATGT ATCTGCATTA GTTGAAT6TG GTATTCCTAA ATCTCAACTG 720 
721 AT6AATCTTT CTACCTGTAA TAAT6TTGTT CCGTTAGTTC GTTTTATTAA C6TAGATTTT 780 
781 TCTTCCCAAC GTCCTGACTG GTATAATGAG CCAGTTCTTA AAATCGCATA AGGTAAHCA 8^0 
8^1 CAATGATTAA AGTTGAAATT AAACCATCTC AAGCCCAATT TACTACTCGT TCTGGTGTTT 900 
901 CTCGTCAGGG CAA6CCTTAT TCACTGAATG AGCAGCTTTG TTAC6TTGAT TT6GGTAAT6 960 
951 AATATCCGGT TCTTGTCAAG ATTACTCHG ATGAAGGTCA GCCAGCCTAT GCGCCTG6TC 1020 
1021 T6TACACCGT TCATCTGTCC TCTTTCAAAG TTGGTCAGTT CGGTTCCCTT ATGAHGACC 1080 
1081 GTCTGCGCCT CGTTCCGGCT AAGTAACATG GA6CAGGTCG CGGATTTC6A CACAATTTAT imo 
lUl CAGGCGATGA TACAAATCTC CGTT6TACTT TGTTTCGCGC HGGTATAAT CGCTGGGGGT 1200 
1201 CAAAGATGAG TGTTTTAGTG TATTCTTTC6 CCTCTTTCGT TTTAGGTTGG TGCCTTCGTA 1260 
1261 GTGGCAHAC GTATniACC CGTTTAATGG AAACTTCCTC ATGAAAAAGT CTHAGTCCT 1320 
1321 CAAAGCCTCT GTAGCCGTTG CTACCCTCGT TCC6ATGCTG TCTTTCGCTG CTGA6GGTGA 1380 
1381 CGATCCCGCA AAAGCGGCCT TTAACTCCCT GCAAGCCTCA GCGACCGAAT ATATCGGHA 1^40 
mi TGCGTGGGCG ATGOTTGHG TCATTGTCGG CGCAACTATC G6TATCAAGC TGTTTAAGAA 1500 
1501 ATTCACCTCG AAAGCAAGCT GATAAACCGA TACAATTAAA GGCTCCmT GGAGCCmT 1560 
1551 rmrGGAGA HTTCAACGT GAAAAAAHA TTATTCGCAA TTCCTTTA6T TGnCCTTTC 1620 
1621 TAHCTCACT CCGCTGAAAC TGHGAAAGT TGHTAGCAA AACCCCATAC AGAAAAHCA 1680 
1581 TTTACTAACG TCTGGAAAGA GCACAAAACT HAGATCGIT ACGCTAACTA TGAGGGHGT 17^*0 
1741 CTGTGGAATG CTACAGGCGT TGTAGniGT ACTGGTGACG AAACTCAGT6 HACGGTACA 1800 
1801 TGGGnCCTA nGGGCHGC TATCCCTGAA AATGAG6GTG GTGGCTCT6A 6GGTGGCGGT 1860 
1861 TCTGAGGGTG GCGGTTCTGA GGGTGGC6GT ACTAAACCTC CTGAGTACGG TGATACACCT 1920 
1921 AHCCGGGCT ATACTTATAT CAACCCTCTC GACGGCACTT ATCCGCCTGG TACTGAGCAA 1980 
1981 AACCCCGCTA ATCCTAATCC HCTCTTGAG GAGTCTCAGC CTCHAATAC nTCATGTTT 20'»0 
20m CAGAATAATA GGHCCGAAA TAGGCAGGGG GCATTAACTG TTTATACGGG CACTGHAa 2100 
2101 CAAG6CACTG ACCCCGTTAA AACTTATTAC CAGTACACTC CTGTATCATC AAAAGCCATG 2160 
2161 TATGACGCTT ACTG6AACGG TAAATTCAGA GACTGCGCTT TCCAHCIGG CTTTAAT6AA 2220 
2221 GATCCAHCG TTTGTGAATA TCAAGGCCAA TCGTCTGACC TGCCTCAACC TCCTGTCAAT 2280 
2281 6CTGGCGGCG GCTCT6GTG6 TGGnCTGGT GGCGGCTCTG AGG6TGGTGG CTCTGAGGGT 23*10 
23'Jl GGCGGTTCTG AGGGT6GC6G CTCTGAGGGA GGCGGHCCG GT6GTGGCTC TGGnCCGGT 2m0 
2401 GATTTTGATT ATGAAAAGAT GGCAAACGCT AATAAGGGGG CTATGACC6A AAATGCCGAT 2460 
2451 GAAAAC6CGC TACA6TCTGA C6CTAAAGGC AAACHGAH CTGTCGCTAC TGAHACGGT 2520 
2521 GCT6CTATCG ATGGTTTCAT TGGTGACGH TCCGGCCTTG CTAATGGTAA TGGTGCTACT 2580 
2581 GGTGATTTT6 CTGGCTCTAA HCCCAAATG GCTCAAGTCG GTGACGGTGA TAATTCACCT 2640 
2541 HAATGAATA ATTTCCGTCA ATATTTACCT TCCCTCCCTC AATCGGHGA ATGTCGCCCT 2700 
2701 TTTGTCriTA 6CGCTGGTAA ACCATATGAA TTTTCTATTG AHGTGACAA AATAAACHA 2760 
2761 nCCGTGGTG TCTTTGCGn TCTTTTATAT GHGCCACCT nATGTATGT ATTTTCTACG 2820 
2821 TTTGCTAACA TACTGC6TAA TAAGGAGTCT TAATCAT6CC AGnCTTTTG GGTAHCCGT 2880 
2881 TAnAHGCG TTTCCTCGGT HCCnCIGG TAACTTTGn CGGCTATCTG CnACTTTTC 2940 
2941 HAAAAAGGG CHCGGTAAG ATAGCTAHG CTATnCAH 6TTTCTTGCT CnAHAHG 3000 
3001 G6CTTAACTC AAnCHGTC GGHATCTCT CTGATAHAG CGCTCAATTA CCCTCTGAa 3060 
3061 HGHCAGGG TGnCAGHA ATTCTCCCGT CTAATGCGCT TCCCTGTTTT TATGHAnC 3120 
3121 TCTCTGTAAA GGCTGCTAH TTCATnTTG ACGHAAACA AAAAATCGH TCnAITTGG 3180 
3181 ATT6GGATAA ATAATATGGC TGTTTAmT GTAACT6GCA AATTAGGCTC T6GAAAGACG 3240 
3241 CTCGHAGCG HGGTAAGAT TTAGGATAAA AHGTAGCTG GGTGCAAAAT AGCAACTAAT 3300 
3301 CnGATHAA GGCHCAAAA CCTCCCGCAA GTC6GGAGGT TCGCTAAAAC GCCTCGCGH 3360 
3361 CHAGAATAC C6GATAAGCC HCTATATCT GATTTGCnG CTAnGGGCG CGGTAAT6AT 3420 
3421 TCCTACGATG AAAATAAAAA CGGCHGCn GHCTCGATG A6TGCG6TAC nGGHTAAT 3480 
3481 ACCCGHCn GGAATGATAA GGAAAGACAG CCGAnATTG AHGGTnCT ACATGCTCGT 3540 
3541 AAATTAGGAT GGGATAHAT TTTTCTTGn CAGGACHAT CTAnGHGA TAAACAGGCG 3600 
3601 CGHCTGCAT TAGCTGAACA TGTTGTrTAT TGTCGTCGTC TGGACAGAAT TACTTTACCT 3660 
3661 niGTCGGTA CniATATTC TCTTATTACT 6GCTCGAAAA TGCCTCTGCC TAAATTACAT 3720 
3721 GnGGCGTTG HAAATATGG CGATTCTCAA HAAGCCCTA CTGTTGAGCG TTGGCTTTAT 3780 
3781 ACTGGTAA6A ATTTGTATAA CGCATATGAT ACTAAACAGG C7TTTTCTAG TAATTATGAT 3840 
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